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POL YKETIDE- ASSOCIATED SUGAR BIOSYNTHESIS GENES 

This application claims the benefit of U.S. Serial No. 08/576,626 filed December 21 , 
1995, now pending. 

Field of the Invention 
The present invention relates to methods for directing the biosynthesis of specific 
polyketide analogs by genetic manipulation. In particular, sugar biosynthesis genes are 
manipulated to produce precise, novel glycosylation-modified macrolides of predicted 
structure. 

Background of the Invention 

Polyketides are a large class of natural products that includes many important 
antibiotic, antifungal, anticancer, and anti-helminthic compounds such as erythromycins, 
amphotericins, daunorubicins, and avermectins. Their synthesis proceeds by an ordered 
condensation of acyl esters to generate carbon chains of varying length, side chain, and 
reduction pattern that are differentially cyclized and subsequently modified to give the mature 
polyketides. For many polyketides, maturation includes the addition of one or more sugar 
residues to the cyclized carbon chain. The sugar residues are frequently critical to the 
biological activity of the mature polyketide. 

Streptomyces and the closely related Saccharopolyspora genera are prodigious 
producers of polyketide metabolites. Because of the commercial significance of these 
compounds, a great amount of effort has been expended in the study of Streptomyces 
genetics. Consequently, much is known about Streptomyces and several cloning vectors exist 
for introducing DNA into these organisms. 

Although many polyketides have been identified, there remains the need to obtain 
novel glycosylation modified (as defined herein) polyketide structures with enhanced 
properties. Current methods of obtaining such molecules include screening of biological 
samples and chemical modification of existing polyketides, both of which are costly and time 
consuming. Current screening methods are based on gross properties of the molecule, i.e. 
antibacterial, antifungal activity, etc., and both a priori knowledge of the structure of the 
molecules obtained or predetermination of enhanced properties are virtually impossible. 
Standard chemical modification of existing structures has been successfully employed, but is 
limited by the number of types of compounds obtainable. Furthermore, the poor yield of 
multistep chemical syntheses often limits the practicality of this approach. The following 
modifications to sugar residues bound to polyketides are particularly difficult or inefficient at 
the present time: change the stereochemistry of specific hydroxyl or methyl groups, change 
the oxidation state of specific hydroxyl groups, and deoxygenation of specific carbons. 
Accordingly, there exists a need to obtain molecules wherein such changes are specified and 
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performed which would represent an improvement in the technology to produce altered 
glycosylation-modified polyketide molecules with predicted structure. 

The present invention overcomes these problems by providing the genetic sequence of 
sugar biosynthesis genes involved in the biosynthesis of polyketide-associated sugars. 

5 

Summary of the Invention 
In one aspect, the present invention provides an isolated single or double stranded 
polynucleotide, typically DNA, having a nucleotide sequence which comprises (a) a 
nucleotide sequence selected from the group consisting of (i) the sense sequence of FIG. 4 A 

10 (SEQ ID NO: 1 ) from about nucleotide position 54 to about nucleotide position 1 1 36; (ii) the 
sense sequence of SEQ ID NO: 1 from about nucleotide position 1 147 to about nucleotide 
position 2412; (iii) the sense sequence of SEQ ID NO:l from about nucleotide position 2409 
to about nucleotide position 3410 ; (iv) the sense sequence of FIG. 4B (SEQ ID NO:2) from 
about nucleotide position 80 to about nucleotide position 1048; (v) the sense sequence of 

15 SEQ ID NO:2 from about nucleotide position 1048 to about nucleotide position 2295; (vi) the 
sense sequence of SEQ ID NO:2 from about nucleotide position 2348 to about nucleotide 
position 3061 ; (vii) the sense sequence of SEQ ID NO:2 from about nucleotide position 3214 
to about nucleotide position 4677; (viii) the sense sequence of SEQ ID NO:2 from about 
nucleotide position 4674 to about nucleotide position 5879; (ix) the sense sequence of SEQ 

20 ID NO:2 from about nucleotide position 5917 to about nucleotide position 7386; and (x) the 
sense sequence of SEQ ID NO:2 from about nucleotide position 7415 to about nucleotide 
position 7996; (b) sequences complementary to the sequences of (a); (c) sequences that, on 
expression, encode a polypeptide encoded by the sequences of (a); and (d) analogous 
sequences that hybridize under stringent conditions to the sequences of (a) and (b). A 

25 preferred molecule is a DNA molecule. In another embodiment, the polynucleotide is an 
RNA molecule. 

In another embodiment, a DNA molecule of the present invention is contained in an 
expression vector. The expression vector preferably further comprises an enhancer-promoter 
operatively linked to the polynucleotide. In a preferred embodiment, the DNA molecule in 

30 the vector is one of the preferred sequences mentioned above. In an especially preferred 
embodiment, the DNA molecule in the vector is the sequence of SEQ ID NO:2 from about 
nucleotide position 80 to about nucleotide position 1048. 

The present invention still further provides for a host cell transformed with a 
polynucleotide or expression vector of this invention. Preferably, the host cell is a bacterial 

35 cell selected from the group consisting of Saccharopolyspora spp., Streptomyces spp. and E. 
coli. 

The present invention also provides methods to produce novel glycosylation modified 
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polyketide structures by designing and introducing specified changes in the DNA governing 
the synthesis and attachment of sugar residues to polyketides. According to one method, the 
biosynthesis of specific glycosylation-modified polyketides is accomplished by genetic 
manipulation of a polyketide-producing microorganism comprising the steps of isolating a 

5 sugar biosynthesis gene-containing DNA sequence from those described above; identifying 
within the gene-containing DNA sequence one or more DNA fragments responsible for the 
biosynthesis of a polyketide-associated sugar or its attachment to the polyketide; creating one 
or more specified changes into the DNA fragment or fragments, thereby resulting in an 
altered DNA sequence; introducing the altered DNA sequence into a polyketide-producing 

10 microorganism to replace the original sequence whereby the altered DNA sequence, when 
translated, results in altered enzymatic activity capable of effecting the production of the 
specific glycosylation-modified polyketide; growing a culture of the altered polyketide- 
producing microorganism under conditions suitable for the formation of the specific 
glycosylation-modified polyketide; and isolating said specific glycosylation-modified 

15 polyketide from the culture. 

In a second method the biosynthesis of specific glycosylation-modified polyketides is 
accomplished by isolating a sugar biosynthesis gene-containing DNA sequence from from 
those described above; identifying within the gene-containing DNA sequence one or more 
DNA fragments responsible for the biosynthesis of a polyketide-associated sugar or its 

20 attachment to the polyketide; reversing the strand orientation of the DNA fragment or 

fragments, thereby resulting in an altered DNA sequence which, when transcribed, results in 
production of an antisense mRNA; introducing the altered DNA sequence into a polyketide- 
producing microorganism having an mRNA capable of binding to the antisense mRNA which 
results in altered enzymatic activity capable of effecting the production of the specific 

25 glycosylation-modified polyketide; growing a culture of the altered polyketide-producing 
microorganism under conditions suitable for the formation of the specific glycosylation- 
modified polyketide; and isolating the specific glycosylation-modified polyketide from the 
culture. 

In a third method the biosynthesis of specific glycosylation-modified polyketides is 
30 accomplished by isolating a sugar biosynthesis gene-containing DNA sequence from from 
those described above; identifying within the gene-containing DNA sequence one or more 
DNA fragments responsible for the biosynthesis of a polyketide-associated sugar or its 
attachment to the polyketide; introducing the DNA fragment or fragments into a polyketide- 
producing microorganism whereupon transcription and translation of the DNA fragment or 
35 fragments generate an altered polyketide-producing microorganism that is capable of 
producing the specific glycosylation-modified polyketide; growing a culture of the 
polyketide-producing microorganism containing the DNA fragment or fragments under 
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conditions suitable for the formation of the specific glycosylation-modified polyketide; and 
isolating the specific glycosylation-modified polyketide from the culture. 

Preferably, the sugar biosynthesis gene-containing DNA sequence of the processes 
described above comprises genes which encode an enzymatic activity involved in the 
biosynthesis of L-mycarose and/or D-desosamine. More preferably, the sugar biosynthesis 
gene-containing DNA sequence comprises the sequence of SEQ ID NO:2 from about 
nucleotide position 80 to about nucleotide position 1048. 

The present invention is especially useful in manipulating sugar biosynthesis genes 
from Streptomyces and Saccharopolyspora, organisms that provide over one-half of the 
clinically useful antibiotics. 

Brief Description of the Drawings 
FIG. 1 A illustrates the organization of the erythromycin biosynthetic gene cluster and 
the genetic designations of the biosynthetic genes; FIG. IB illustrates an abbreviated 
erythromycin biosynthetic scheme that broadly associates the biosynthetic genes with their 
role in erythromycin biosynthesis. Seven eryB genes, eryBI - eryBVII, are responsible for the 
biosynthesis of L-mycarose or its attachment to the erythronolide B ring, and six eryC genes, 
eryCl - eryCVh are responsible for the biosynthesis of D-desosamine or its attachment to 3- 
a-mycarosylerythronolide B. The dashed arrows indicate that the pathway through 
erythromycin B is not the principal natural biosynthetic route to erythromycin A. 

FIG. 2 illustrates the proposed scheme for the biosynthesis of L-mycarose and the 
eryB genes responsible for the specific steps. 

FIG. 3 illustrates the proposed scheme for the biosynthesis of D-desosamine and the 
eryC genes responsible for the specific steps. 

FIG. 4A(l-4) illustrates the nucleotide sequence (SEQ ID NO:l) of the sugar 
biosynthesis genes eryCII (coordinates 54-1 136), eryCIII (coordinates 1 147-2412), and 
eryBJI (coordinates 2409-3410), with corresponding translation of the open reading frames 
(SEQ ID NO:3, SEQ ID NO:4 and SEQ ID NO:5 respectively). Standard one letter codes for 
the amino acids appear beneath their respective nucleic acid codons as described herein. 

FIG. 4B(l-9) illustrates the nucleotide sequence (SEQ ID NO:2) of the sugar 
biosynthesis genes eryBlV (coordinates 80-1048), eryBV (coordinates 1048-2295), eryCVI 
(coordinates 2348-3061), eryBVl (coordinates 3214-4677), eryCIV (coordinates 4674-5879), 
eryCV (coordinates 5917-7386), and eryB VII (coordinates 7415-7996) with corresponding 
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translation of the putative open reading frames (SEQ ID NO:6, SEQ ID NO:7, SEQ ID 
NO:8, SEQ ID NO:9, SEQ ID NO: 10, SEQ ID NO: 1 1 and SEQ ID NO: 12 respectively). 
Standard one letter codes for the amino acids appear beneath their respective nucleic acid 
codons as described herein. 

5 

FIG. 5A illustrates the amino acid sequence identity between the sugar biosynthesis 
enzyme encoded by the eryBIV gene of Sac. erythraea (SEQ ID NO:6) and the sugar 
biosynthesis enzymes encoded by the ascF gene of Yersinia pseudotuberculosis [Thorson et 
al, 3. Bacterid, 176:5483 (1994)], (SEQ ID NO: 13), the rfbJ gene of Salmonella enterica 
10 [Jiang et at, Mol Microbiol. , 5:695 (1991)]. (SEQ ID NO: 14), the strL gene of Streptomyces 
griseus [Pissowotzki et al, Mol Gen. Genet 241:193 (1993)] (SEQ ID NO: 15) and the galE 
gene of Escherichia coli [Lemaire and Hill, Nuci Acids Res. 14:7705 (1986)] (SEQ ID 
NO: 16). In this and all other Figures in which amino acid sequence identity is compared 
capitalized letters represent consensus (identical) amino acids between species or amino acids 
15 which are conservative substitutions for the consensus residues. Also in each Figure, the 

sequence identified as "consensus" is merely a convenient representation of conserved amino 
acids and is not intended as a representation of any existing polypeptide sequence. 

FIG. 5B illustrates the amino acid sequence identity between the sugar biosynthesis 
enzyme encoded by the eryBVII gene of Sac. erythraea (SEQ ID NO: 1 2) and the sugar 
biosynthesis enzymes encoded by the strM gene of Streptomyces griseus [Pissowotzki et al, 
Mol Gen. Genet 241:193 (1993)] (SEQ ID NO: 17), the rfbC gene of Salmonella enterica 
[Jiang et al, Mol Microbiol , 5:695 (1991)] (SEQ ID NO:18), the rfbF gene of Yersinia 
entercolitica [Zhang et al, Mol Microbiol , 9:309 (1993)] (SEQ ID NO:19), and the ascE 
gene of Yersinia pseudotuberculosis [Thorson et al, J. Bacteriol, 176:5483 (1994)] (SEQ ID 
NO:20). 

FIG. 5C illustrates the amino acid sequence identity between the sugar biosynthesis 
enzyme encoded by the eryCTV gene of Sac. erythraea (SEQ ID NO: 10) and the sugar 
30 biosynthesis enzymes encoded by the eryCI gene of Sac. erythraea [Dhillon et al, Mol. 

Microbiol, 3:1405 (1989)] (SEQ ID NO:21), the ascC gene of Yersinia pseudotuberculosis 
[Weigel etal, Biochemistry, 31:2129 (1992), Thorson et al, J. Am. Chem. Soc, 1 15:6993 
(1993), Thorson et al, J. Bacteriol, 176:5483 (1994)] (SEQ ID NO:22) t the dnrJ gene of 
Streptomyces peucetius [Stutzman-Engwall et al, J. Bacteriol, 174:144 (1992)] (SEQ ID 
35 NO:23), the prgl gene of Streptomyces alboniger [Lacalle et al, EMBO J., 1 1 :785 (1992)] 
(SEQ ID NO:24), and the strS gene of Streptomyces griseus [Distler et al, Gene, 1 15:105 
(1992)] (SEQ ID NO:25). 



20 



25 
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FIG. 5D illustrates the amino acid sequence identity between the sugar biosynthesis 
enzymes encoded by the eryBV and eryCIH genes of Sac. erythraea (SEQ ID NO:7 and SEQ 
ID NO:4 respectively) and the sugar biosynthesis enzyme encoded by the dnrS gene of 
5 Streptomyces peucetius [Otten et aL, 7. Bacteriol, 177:6688 (1995)] (SEQ ID NO:26). 

FIG. 5E illustrates the amino acid sequence identity between the sugar biosynthesis 
enzyme encoded by the eryCVl gene of Sac. erythraea (SEQ ID NO:8) and the sugar 
biosynthesis enzymes encoded by the srmX gene of Streptomyces ambofaciens [Geistlich et 
io aL, Mol Microbiol, 6:2019 (1992)] (SEQ ID NO:27), the rdmD gene of Streptomyces 
purpurascens [GenBank Accession: U 10405] (SEQ ID NO:28) and the glycine 
methyltransferase of Rattus norvegious [Ogawa et aL, Eur 7. Biochem. 168:141 (1987)] 
(SEQ ID NO:29). 

15 FIG. 6A through 6D illustrate the compounds conceivably formed in Examples 1-4 

respectively and are representative of compounds formed from Type I (FIG 6A), Type II 
(FIG. 6B), and Type III (FIGS. 6C and 6D) alterations. 

FIG. 7 illustrates the construction of the expression plasmid pASX2 described in 
20 Example 2. For FIGS 7-13 the following abbreviations have been used: amp, ampicillin 
resistance gene; tsr, thiostrepton resistance gene; ROP, repressor of plasmid synthesis gene; 
eryBl, eryBII, eryBIII, eryBIV, eryBV, eryBVI, eryBVII, eryCI, eryCII, eryCJU, eryCIV, 
eryCV, and eryCVI, the erythromycin biosynthetic genes involved in the synthesis of 
mycarose or its attachment to the macrolide ring (eryB) or the synthesis of desosamine or its 
25 attachment to the macrolide ring (eryQ [the thin arrows above a gene indicate its relative size 
and the direction of transcription]; ori-£. colU an origin of DNA replication that functions in 
E. coli, in the specific examples the ColEl origin; on-Streptomyces, an origin of DNA 
replication that functions in Streptomyces, in the specific examples the pJVl origin [Servin- 
Gonzalez et aL, Microbiology, 141 :2499 (1995)]; p-ermE*, a modified promoter for the 
30 erythromycin resistance gene; t-fd, the gene VIII transcription terminator of bacteriophage fd; 
PCR, polymerase chain reaction. Restriction enzyme sites have been indicated by their 
standard commercial names (i.e. BamHl, EcoRl, etc). The abbreviations appended to the 
large arrows in the plasmid synthetic schemes summarize each of the steps involved the 
plasmid constructions. These steps are described fully in the relevant Examples. 

35 

FIG. 8 illustrates the construction of the eryBVII antisense expression plasmid 
pASB VII described in Example 2. 
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FIG. 9 A illustrates the construction of the carrier plasmid pKl . 

FIG. 9B-E illustrates the construction of plasmid pKB6 which carries all of the eryB 
5 genes and is described in Example 3. 

FIG. 10 illustrates the construction of expression plasmid pXl described in Example 

3. 

io FIG. 1 1 illustrates the construction of the eryB expression plasmids pXSB6 and pXB6 

described in Example 3. 

FIG. 12A-B illustrate the construction of plasmid pKC4 which carries all of the eryC 
genes described in Example 4. 

15 

FIG. 13 illustrates the construction of the eryC expression plasmids pXSC4 and pXC4 
described in Example 4. 

Detailed Description of the Invention 

20 I. The Invention 

The present invention provides isolated and purified polynucleotides that encode 
enzymes or fragments thereof responsible for the biosynthesis of polyketide-associated sugars 
or their attachment to polyketides, vectors containing those polynucleotides, host cells 
transformed with those vectors, a process of making novel glycosylated polyketides using 

25 those polynucleotides and vectors, and isolated and purified recombinant polypeptides and 
polypeptide fragments thereof. 

IL Definitions 

For the purposes of the present invention as disclosed and claimed herein, the 
30 following terms are defined. 

The term "polyketide" as used herein refers to a large and diverse class of natural 
products, including but not limited to antibiotic, antifungal, anticancer, and anti-helminthic 
compounds. Antibiotics include, but are not limited to anthracyclines and macrolides of 
different types (polyenes and avermectins as well as classical macrolides such as 
35 erythromycins). 

The term "glycosylated polyketide" refers to any polyketide that contains one or more 
sugar residues. 
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The term "glycosylation-modified polyketide" refers to a polyketide having a changed 
glycosylation pattern or configuration relative to that particular polyketide' s unmodified or 
native state. 

The term "polyketide-producing microorganism" as used herein includes any 
microorganism that can produce a polyketide naturally or after being suitably engineered (i.e. 
genetically). Examples of actinomycetes and the polyketides they naturally produce include 
but are not limited to those listed in Table 1 below (see Hopwood, D.A. and Sherman, D.H., 
Annu. Rev. Genet., 24:37-66 (1990) incorporated herein by reference). 

Table 1 



Organism 


Polyketide Produced 


Saccharopolyspora erythraea 


Erythromycin 


Streptomyces ambofaciens 


Spiramycin 


Streptomyces avermitilis 


Avermectin 


Streptomyces fradiae 


Tylosin 


Streptomyces griseus 


Candicidin, monactin, griseusin 


Streptomyces violaceoniger 


Granaticin 


Streptomyces thermotolerans 


Carbomycin 


Streptomyces rimosus 


Oxytetracycline 


Streptomyces peucetius 


Daunorubicin 


Streptomyces coelicolor 


Actinorhodin 


Streptomyces glaucescens 


Tetracenomycin 


Streptomyces roseofulvus 


Frenolicin 


Streptomyces cinnamonensis 


Monensin 


Streptomyces curacoi 


Curamycin 


Amycolatopsis mediterranei 


Rifamycin 



Other examples of polyketide-producing microorganisms that produce polyketides 
naturally include various Actinomadura , Dactylosporangium and Nocardia strains. 

The term "sugar biosynthesis genes" as used herein refers to sequences of DNA from 
Saccharopolyspora erythraea that encode sugar biosynthesis enzymes and is intended to 
include sequences of DNA from other polyketide-producing microorganisms which are 
identical or analogous to those obtained from Saccharopolyspora erythraea. 

The term "sugar biosynthesis enzymes" as used herein refers to polypeptides which 
are involved in the biosynthesis and/or attachment of polyketide-associated sugars and their 
derivatives and intermediates. 
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The term "polyketide-associated sugar" refers to a sugar that is known to attach to 
polyketides or that can be attached to polyketides by the processes described herein. 

The term "sugar derivative" refers to a sugar which is naturally associated with a 
polyketide but which is altered relative to the unmodified or native state; examples only 
5 include N-3-ot-desdimethyl D-desosamine, D-mycarose, 4-keto-L-mycarose, 4-keto-D- 
mycarose, 3-desmethyl L-mycarose and 3-desmethyl D-mycarose. 

The term "sugar intermediate" refers to an intermediate compound produced in a 
sugar biosynthesis pathway. 

The term "eryB" as used herein refers to sequences of DNA that encode enzymes 
10 involved specifically in the biosynthesis of the deoxysugar L-mycarose. 

The term "eryC 1 as used herein refers to sequences of DNA that encode enzymes 
involved specifically in the biosynthesis of the deoxysugar D-desosamine. 

III. Polynucleotides 

15 The organization of the segment of the Saccharopolyspora erythraea (Sac. erythraea) 

chromosome that determines the biosynthesis of erythromycin and the corresponding genes 
that determine the biosynthesis of the sugars L-mycarose and D-desosamine, designated 
eryB and eryC, respectively, are shown in FIG. 1 A. It is seen that several genes are required 
for the biosynthesis of each of the sugars and that these genes are interspersed among one 

20 another. It is predicted that each gene encodes an enzyme that catalyzes one or a few steps in 
the biosynthesis of L-mycarose or D-desosamine from thymidine diphospho-4-keto-6 
deoxyglucose (TDP-glucose); these steps are outlined in FIG. 2 and FIG. 3. In the case of L- 
mycarose, (shown in FIG. 2), these steps include: (1) C-2" deoxygenation , (2) C-27C-3" 
enoyl reduction, (3) C-5" epimerization, (4) C-3" C-methylation, (5) C-4" keto reduction, and 

25 (6) transfer to erythronolide B. For D-desosamine, shown in FIG. 3, these steps comprise (1) 
0473' isomerization, (2, 3) C-3' deoxygenation and reduction, (4) C-3' amination, 
(5, 6) N-3a' N-dimethylation, and transfer to mycarosyl erythronolide B. 

This classification of genes (as belonging to either the eryB class or eryC class) was 
determined by first altering the wild type genes of interest in an erythromycin producing 

30 strain (i.e. in vivo) to inactivate their expression. The erythromycin products resulting from 
such alterations were then analyzed. Genes whose alterations caused an accumulation of 
erythronolide B (indicating a lack of L-mycarose, or failure to attach L-mycarose to the 
erythronolide ring) were classified as eryB genes; genes whose alterations caused an 
accumulation of 3-a-L-mycarosyl erythronolide B (indicating a lack of D-desosamine, or 

35 failure to attach D-desosamine to the 3-a-L-mycarosyl erythronolide B ring) were classified 
as eryC genes. Accordingly, it should be noted that all such genes identified herein as eryB 
or eryC are involved in the synthesis of L-mycarose or D-desosamine. The predicted 
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functional activities of the polypeptides encoded by eryB and eryC will be discussed in 
further detail below. 

In one aspect then, the present invention provides isolated and purified eryB and eryC 
polynucleotides from Sac. erythraea that encode enzymes involved in the production of 
glycosylated polyketides. A polynucleotide of the present invention that encodes a sugar 
biosynthesis enzyme is an isolated single or double stranded polynucleotide having a 
nucleotide sequence which comprises (a) a nucleotide sequence selected from the group 
consisting of (i) the sense sequence of FIG. 4A (SEQ ID NO:l) from about nucleotide 
position 54 to about nucleotide position 1 136; (ii) the sense sequence of SEQ ID NO: 1 from 
about nucleotide position 1 147 to about nucleotide position 2412; (iii) the sense sequence of 
SEQ ID NO: 1 from about nucleotide position 2409 to about nucleotide position 3410 ; (iv) 
the sense sequence of FIG. 4B (SEQ ID NO:2) from about nucleotide position 80 to about 
nucleotide position 1048; (v) the sense sequence of SEQ ID NO:2 from about nucleotide 
position 1048 to about nucleotide position 2295; (vi) the sense sequence of SEQ ID NO:2 
from about nucleotide position 2348 to about nucleotide position 3061 ; (vii) the sense 
sequence of SEQ ID NO:2 from about nucleotide position 3214 to about nucleotide position 
4677; (viii) the sense sequence of SEQ ID NO:2 from about nucleotide position 4674 to 
about nucleotide position 5879; (ix) the sense sequence of SEQ ID NO:2 from about 
nucleotide position 5917 to about nucleotide position 7386; and (x) the sense sequence of 
SEQ ID NO:2 from about nucleotide position 7415 to about nucleotide position 7996; 

(b) sequences complementary to the sequences of (a), 

(c) sequences that, when expressed, encode polypeptides encoded by the sequences of 

(a), and 

(d) analogous sequences that hybridize under stringent conditions to the sequences of 

(a). 

A preferred polynucleotide is a DNA molecule. In another embodiment, the polynucleotide 

is an RNA molecule. 

The nucleotide sequence and deduced amino acid residue sequences of the sugar 
biosynthesis genes are set forth in FIG. 4A(l-4) and FIG. 4B(l-9). The nucleotide sequences 
of FIG. 4A(l-4) (SEQ ID NO:l) and FIG. 4B(l-9) (SEQ ID NO:2) represent full length DNA 
clones of the sense strand of two distinct clusters of sugar biosynthesis genes and are 
intended to represent both the sense strand (shown on top) and its complement. The amino 
acid sequences depicted below the sense strand correspond to polypeptides encoded by a 
nucleotide sequence selected from the group consisting of (i) the sense strand of SEQ ID 
NO: 1 from about nucleotide position 54 to about nucleotide position 1 136 (ii) the sense 
sequence of SEQ ID NO: 1 from about nucleotide position 1 147 to about nucleotide position 
2412, (iii) the sense sequence of SEQ ID NO:l from about nucleotide position 2409 to about 
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nucleotide position 3410, (iv) the sense sequence of SEQ ID NO:2 from about nucleotide 
position 80 to about nucleotide position 1048, (v) the sense sequence of SEQ ID NO:2 from 
about nucleotide position 1048 to about nucleotide position 2295, (vi) the sense sequence of 
SEQ ID NO:2 from about nucleotide position 2348 to about nucleotide position 3061, (vii) 
the sense sequence of SEQ ID NO:2 from about nucleotide position 3214 to about nucleotide 
position 4677, (ix) the sense sequence of SEQ ID NO:2 from about nucleotide position 5917 
to about nucleotide position 7386 and (x) the sense sequence of SEQ ID NO:2 from about 
nucleotide position 7415 to about nucleotide position 7996. The polypeptides encoded by the 
nucleotide sequences of (i)-(x) above are set forth as SEQ ID NO:3-SEQ ID NO: 12 
respectively. 

The present invention also contemplates analogous DNA sequences which hybridize 
under stringent hybridization conditions to the DNA sequences set forth above. Stringent 
hybridization conditions are well known in the art and define a degree of sequence identity 
greater than about 80%-9O%. The modifier "analogous" refers to those nucleotide sequences 
that encode analogous polypeptides (i.e. in relation to a sugar biosynthesis enzyme), 
analogous polypeptides being those which have only conservative differences and which 
retain the conventional characteristics and activities of sugar biosynthesis enzymes. (A more 
detailed description of analogous polypeptides is provided below). The present invention 
also contemplates naturally occurring allelic variations and mutations of the DNA sequences 
set forth above so long as those variations and mutations code, on expression, for a sugar 
biosynthesis gene of this invention as set forth hereinafter. 

As is well known in the art, because of the degeneracy of the genetic code, there are 
numerous other DNA and RNA molecules that can code for the same polypeptides as those 
encoded by the aforementioned sugar biosynthesis genes and fragments thereof. The present 
invention, therefore, contemplates those other DNA and RNA molecules which, on 
expression, encode the polypeptides of SEQ ID NO:3-SEQ ID NO: 1 1 or fragments thereof. 
Having identified the amino acid residue sequence encoded by a sugar biosynthesis gene, and 
with knowledge of all triplet codons for each particular amino acid residue, it is possible to 
describe all such encoding RNA and DNA sequences. DNA and RNA molecules other than 
those specifically disclosed herein and, which molecules are characterized simply by a 
change in a codon for a particular amino acid, are within the scope of this invention. 

The 20 common amino acids and their representative abbreviations, symbols and 
codons are well known in the art (see for example, Molecular Biology of the Cell, Second 
Edition, B. Alberts et a/., Garland Publishing Inc., New York and London, 1989). As is also 
well known in the art, codons constitute triplet sequences of nucleotides in mRNA molecules 
and as such, are characterized by the base uracil (U) in place of base thymidine (T) which is 
present in DNA molecules. A simple change in a codon for the same amino acid residue 
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within a polynucleotide will not change the structure of the encoded polypeptide. By way of 
example, it can be seen from SEQ ID NO: 1 that an AGC codon for serine exists at nucleotide 
positions 126-128 and again at positions 420-422 and 561-563. However, it can also be seen 
from that same sequence that serine can be encoded by a TCG codon (see eg. nucleotide 
positions 192-194) and a TCC codon (see e.g., nucleotide positions 204-206). Substitution of 
the latter codons for serine with the AGC codon for serine, or visa versa, does not 
substantially alter the DNA sequence of SEQ ID NO:l and results in production of the same 
polypeptide. In a similar manner, substitutions of the recited codons with other equivalent 
codons can be made in a like manner without departing from the scope of the present 
invention. 

A polynucleotide of the present invention can also be an RNA molecule. An RNA 
molecule contemplated by the present invention is complementary to or hybridizes under 
stringent conditions to any of the DNA sequences set forth above. Exemplary and preferred 
RNA molecules are mRNA molecules that encode sugar biosynthesis enzymes of this 
invention. 

IV. Polypeptides 

In another aspect, the present invention provides polypeptides which are reasonably 
believed to be sugar biosynthesis enzymes. A sugar biosynthesis enzyme of the present 
invention is a polypeptide of about 21 kdal to about 47 kdal. As set forth in FIG. 5A-5E, 
analogs of the predicted polypeptides encoded by certain eryB and eryC genes have been 
identified in various species and their sequences compared using the PRETTY routine 
(Genetics Computer Group (GCG) Sequence Analysis Software Package, Madison, WI). 
Due to the degree of amino acid sequence identity existing between the polypeptides of these 
other sugar biosynthesis genes and the polypeptides encoded by the eryB and eryC genes, 
certain enzymatic activities can reasonably be attributed to the eryB and eryC polypeptides. 

By way of example, analogs of the polypeptide encoded by the eryBIV gene have 
been identified in Yersinia pseudotuberculosis, Salmonella enterica, Streptomyces griseus and 
Escherichia coli (see FIG. 5A). The various analogs have been identified with from 290-328 
amino acid residues and are characterized by a low degree of amino acid sequence identity. 
(For example, the identity between the sugar biosynthesis enzyme encoded by the eryBIV 
gene of Sac. erythraea and the sugar biosynthesis enzyme encoded by the galE gene of E. 
coli is 20% at the amino acid level). However, a conserved amino acid sequence motif, G x x 
G x x G (where G represents the amino acid glycine and x represents any other amino acid 
residue) is found within the first 30 amino acid residues of all analogs shown. Since the 
polypeptide encoded by the galE gene has been shown to be an epimerase (whose mechanism 
includes a ketoreduction (Bauer et aU Proteins 12:372 (1992)), the eryBIV gene product is 



WO 97/23630 



PCT/US96/20238 



13 

reasonably predicted to be a ketoreductase. 

As set forth in FIG. 5B analogs of the sugar biosynthesis enzyme encoded by the 
eryBVIl gene have been identified in Streptomyces griseus Salmonella enterica, Yersinia 
entercolitica and Yersinia pseudotuberculosis. The various analogs have been identified with 

5 from 1 83-200 amino acid residues and are characterized by a moderate degree of amino acid 
identity. By way of example, the identity at the amino acid level between the sugar 
biosynthesis enzyme encoded by the eryBVIl gene of Sac. erythraea and the sugar 
biosynthesis enzyme encoded by the rfbC gene of Salmonella enterica or the strM gene of 
Streptomyces griseus is 37% and 61 %, respectively. Furthermore, a common characteristic 

io of these particular polypeptides (including that of eryBVIl), is that they are only associated 
with L-sugar biosynthesis and not with D-sugar biosynthesis. Thus the gene product of 
eryBVIl is reasonably predicted to function as a C-5 epimerase which converts the 
stereochemistry of the sugar from the "D" configuration to the "L" configuration. 

As set forth in FIG. 5C analogs of the sugar biosynthesis enzyme encoded by the 

15 eryCIV gene have been identified in Sac. erythraea and Yersinia pseudotuberculosis. As set 
forth in FIG. 5C, the predicted amino acid sequences of the protein products of eryCI and 
eryCIV share 34% sequence identity to each other, 27% and 25% respectively to the 
predicted amino acid sequence encoded by ascC from Yersinia pseudotuberculosis. The 
enzyme encoded by ascC has been shown to remove a hydroxyl group located at the C-3 

20 position of L-ascarylose (Liu and Thorson, Annu. Rev. Microbiol. 48:223 (1994)). Thus, at 
least one of the polypeptides encoded by eryCI or eryCIV is predicted to be an enzyme which 
functions in deoxygenation reactions. 

Furthermore, the enzyme encoded by the ascC gene requires the biochemical cofactor 
pyridoxamine, which is the same cofactor used in biochemical transamination reactions. 

25 Consequently, it has been proposed that some protein analogs (such as dnrJ from 

Streptomyces peucetius, prgl from Streptomyces alboniger and strs from Streptomyces 
griseus) having a moderate degree of sequence similarity to the polypeptide encoded by ascC 
function as transaminases in amino sugar biosynthesis (Thorson et aL, J. Am. Chem. Soc. 
1 15:6993 (1993)). Since the biosynthesis of D-desosamine requires both deoxygenation and 

30 transamination, it is reasonable to predict that at least one of the polypeptides encoded by the 
eryCI or eryCIV genes functions in transamination reactions. 

As set forth in FIG. 5D the predicted polypeptides encoded by eryBV and eryCUl 
share 43% identity at the amino acid level and as such, may be assumed to have similar 
activities with respect to their particular sugars. However, as shown in FIGS. 2 and 3, there 

35 are no common steps in the proposed pathways of L-mycarose and D-desosamine 

biosynthesis. Rather than having similar sugar biosynthesis functions, these polypeptides are 
predicted to be nucleotidyl-sugar transferases which, (in Sac. erythraea at least), function to 



WO 97/23630 



PCT/US96/20238 



14 

attach L-mycarose and D-desosamine to erythronolide B and 3-ct-mycarosylerythronolide B, 
respectively. 

As set forth in FIG. 5E analogs of the polypeptide encoded by the eryCVJ gene have 
been identified in Streptomyces ambofaciens, Streptomyces purpurascens, and Rattus 
norvegicus. The various analogs have been identified with from 237-293 amino acid residues 
and are characterized by a low to moderate degree of amino acid identity. By way of 
example, the identity between the polypeptide encoded by the eryCVI gene of Sac. erythraea 
and the glycine methyltransferase of Rattus norvegicus is 26% at the amino acid level. 
Furthermore these sugar biosynthesis enzymes share a common sequence motif, 
LDVACGTG (SEQ ED NO:30 = amino acid positions 64-71 in the consensus sequence in 
FIG. 5E), with rat glycine methyltransferase whose biochemical function is known (Ogawa et 
aL, Eur. J. Biochem. 168:141 (1987)). Thus these polypeptides are predicted to be N- 
methyl transferases. 

In another aspect, the present invention provides a recombinant C-4" keto reductase 
from Sac. erythraea . A recombinant Sac. erythraea C-4" ketoreductase of the present 
invention is a polypeptide of about 322 or less amino acid residues. A preferred recombinant 
Sac. erythraea C-4" ketoreductase is that encoded by the nucleotide sequence of SEQ ID 
NO:2 from about nucleotide position 80 to about nucleotide position 1048. 

The present invention also contemplates amino acid residue sequences that are 
substantially duplicative of the sequences set forth herein such that those sequences 
demonstrate like biological activity to disclosed sequences. Such contemplated sequences 
include those analogous sequences characterized by a minimal change in amino acid residue 
sequence or type (e.g., conservatively substituted sequences) which insubstantial change does 
not alter the fundamental nature and biological activity of the aforementioned sugar 
biosynthesis enzymes. 

It is well known in the art that modifications and changes can be made in the structure 
of a polypeptide without substantially altering the biological function of that peptide. For 
example, certain amino acids can be substituted for other amino acids in a given polypeptide 
without any appreciable loss of function. In making such changes, substitutions of like amino 
acid residues can be made on the basis of relative similarity of side-chain substituents, for 
example, their size, charge, hydrophobicity, hydrophilicity, and the like, 

As detailed in United States Patent No. 4,554,101, incorporated herein by reference, 
the following hydrophilicity values have been assigned to amino acid residues: Arg (+3.0); 
Lys (+3.0); Asp (+3.0); Glu (+3.0); Ser (+0.3); Asn (+0.2); Gin (+0.2); Gly (0); Pro (-0.5); 
Thr (-0.4); Ala (-0.5); His (-0.5); Cys (-1.0); Met (-1.3); Val (-1.5); Leu (-1.8); He (-1.8); Tyr 
(-2.3); Phe (-2.5); and Trp (-3.4). It is understood that an amino acid residue can be 
substituted for another having a similar hydrophilicity value (e.g., within a value of plus or 
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minus 2.0) and still obtain a biologically equivalent polypeptide. 

In a similar manner, substitutions can be made on the basis of similarity in 
hydropathic index. Each amino acid residue has been assigned a hydropathic index on the 
basis of its hydrophobicity and charge characteristics. Those hydropathic index values are: 
He (+4.5); Val (+4.2); Leu (+3.8); Phe (+2.8); Cys (+2.5); Met (+1.9); Ala (+1.8); Gly (-0.4); 
Thr (-0.7); Ser (-0.8); Trp (-0.9); Tyr (-1.3); Pro (-1.6); His (-3.2); Glu (-3.5); Gin (-3.5); Asp 
(-3.5); Asn (-3.5); Lys (-3.9); and Arg (-4.5). In making a substitution based on the 
hydropathic index, a value of within plus or minus 2.0 is preferred. 

V. Production of novel glycosylated polyketides 

In another aspect, the present invention comprises a general procedure for producing 
novel polyketide structures in vivo by selectively altering, inactivating, or augmenting the 
genetic information of the organism that naturally produces a related polyketide. That is, in 
the present invention, novel polyketides of desired structure are produced by manipulation of 
the eryB and/or eryC genes followed by their introduction into various polyketide-producing 
microorganisms. These manipulations result in the formation of "glycosylation-modified" 
polyketides (i.e. polyketides having an altered glycosylation pattern or configuration relative 
to their native state). For example, "glycosylation-modified" polyketides are those which 
have additional sugar groups attached (where none previously existed), different sugars (such 
as sugar intermediates) attached in place of the natural sugars or lack sugar groups (at 
positions where sugar groups previously existed). 

In the case of Type I and Type II alterations (further described below) glycosylation- 
modified polyketides may arise though mechanisms which cause either (1) the non- 
production of the sugar attachment enzyme (i.e. the enzyme involved in attachment of a sugar 
to the the polyketide structure) or (2) the non-production of a sugar biosynthesis enzyme. In 
the first instance, the sugar will not be attached to the polyketide since the enzyme which 
functions to attach the sugar will be lacking. In the second situation, a sugar intermediate 
from the biosynthesis pathway will be produced (depending on which enzyme is lacking) and 
attached to the polyketide provided it is recognized as a suitable substrate by the sugar 
attachment enzyme; alternatively, it will not be recognized and therefore, not attached. In the 
case of Type III alterations (also described in detail below), glycosylation-modified 
polyketides arise via attachment of additional or different sugars (i.e. not normally found in a 
particular polyketide-producing strain) to the polyketide. It should be noted, that these 
postulated mechanisms are simply provided to enhance understanding of the novel processes 
described herein; the actual mechanisms by which the Type I, II and III alterations produce 
glycosylation-modified polyketides is not presently known. 

In the first type of alteration (referred to herein as Type I alterations), genetically 
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altered eryB and/or eryC genes are introduced into the chromosome of Sac. erythraea or 
another glycosylated polyketide-producing organism that also produces L-mycarose, D- 
desosamine, or their closely related derivatives such as mycaminose (4-hydroxy D- 
desosamine). The genetic alteration of an eryB and/or eryC gene is such that it causes a non- 
functional enzyme to be synthesized. Once introduced into an appropriate strain, the altered 
gene replaces its corresponding wild type gene causing the strain to lose the ability to 
produce a particular enzymatic activity involved in sugar biosynthesis. As a result, a 
glycosylation-modified polyketide is produced via either of the mechanisms previously 
described for a Type I alteration. 

In a Type I change described herein, a specific mutation in an eryB and/or eryC gene 
of the Sac. erythraea chromosome is accomplished by a three step process which involves: 
1) specifically altering the DNA sequence of a desired sugar biosynthesis gene, 2) subcloning 
the altered sequence into a suitable vector capable of recombining in the chromosome of an 
appropriate host and 3) introducing the vector containing the subcloned sequence into the 
appropriate host so that exchange of the wild type allele with the mutated one will occur. The 
first step is accomplished using standard recombinant DNA techniques to effect a deletion, 
base pair conversion or frame-shift in the DNA sequence. The second step, which also 
employs standard recombinant techniques, involves subcloning the altered sequence into a 
vector which does not replicate in Sac. erythraea or the desired host. In the final step, the 
vector is introduced into a suitable host, where by the process of gene replacement, the 
altered allele replaces the wild-type one. All techniques employed in a Type I change are 
well known to those of ordinary skill in the art. 

Example 1 illustrates the process of gene replacement of an eryB gene. As Example 1 
shows, the eryB gene of interest is mutated and along with adjacent upstream and 
downstream DNA sequences, cloned into a non-replicating Sac. erythraea plasmid vector. 
The vector carrying the mutated allele and adjoining DNA is then introduced into the host 
strain by the process of protoplast transformation. Transformants are regenerated under 
selective conditions (i.e. conditions that require expression of a particular plasmid marker) in 
order to induce recombination of the plasmid into the host cell chromosome. In other words, 
since the plasmid does not replicate autonomously, it must reside in the chromosome to be 
maintained in the cell and to express a particular marker under selective conditions. Insertion 
is achieved when the regenerated cells undergo a single homologous recombination between 
one of the two DNA segments that flank the mutation on the plasmid and its homologous 
counterpart in the chromosome. The cells are then grown without selection for the marker 
which induces plasmid loss from the chromosome. This loss arises after the cells have 
undergone a second recombination between the second DNA segment that flanks the 
mutation and its homologous chromosomal counterpart. This second recombinational event 
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results in the loss of the plasmid sequences and the wild type allele from the chromosome; the 
mutant allele however is retained. 

In a variation of a Type I change, the non -production of the sugar biosynthesis 
enzyme (or attachment enzyme) may be achieved by the alternative mechanisms of promoter 
inactivation and/or transcriptional terminator insertion. These variations do not effect the 
gene sequence itself but rather regulatory mechanisms involved in gene transcription. 
"Promoter" as used herein refers to that region of a DNA molecule which controls the 
initiation of RNA transcription. Such regions are known to bind RNA polymerases (i.e. the 
enzymes involved in synthesizing RNA molecules). This form of Type I change (i.e. 
promoter inactivation) involves two steps of 1) identifying the promoter region of the desired 
gene and 2) rendering the promoter region inoperable by mutation. As in the replacement 
mechanism described above such mutations may be effected by creating deletions in the 
promoter sequence or by base pair conversion. In the case where the promoter controls 
transcription of a single gene, inactivation of the promoter will eliminate expression of that 
particular gene; of course, where the promoter controls expression of an entire operon (i.e. a 
series of genes whose expression is controlled by a single promoter), promoter inactivation 
will effectively eliminate expression of all genes in that operon. 

In a similar manner, the non-production of a sugar biosynthesis enzyme (or 
attachment enzyme) may arise from inserting a transcriptional terminator upstream from the 
gene to be inactivated. A "transcriptional terminator" as used herein is a nucleotide sequence 
which signals RNA polymerase to cease transcription. An example of a transcriptional 
terminator is a palindromic sequence capable of forming a stem-loop structure that is 
followed by a stretch of U residues (for example the transcriptional terminator that follows 
gene VIII of bacteriophage fd (Beck and Zink, Gene, 16:35 (1981)). Effecting a change in 
production of a sugar biosynthesis gene by this process involves 1) identifying of the gene or 
genes of interest (in the case of an operon arrangement) to be inactivated and 2) cloning a 
transcriptional terminator sequence in a region of the DNA upstream from such gene(s). A 
transcriptional terminator will cause the polymerase involved in RNA transcription to stop (at 
or near the signaling region) thereby preventing transcription of any downstream sequences. 
Thus, changes such as promoter inactivation and transcriptional insertion, which directly 
effect expression of sugar biosynthesis genes are also intended to be within the scope of the 
invention. 

In the second case (referred to herein as Type II alterations) eryB and/or eryC genes 
are arranged on a vector in an antisense orientation relative to a promoter capable of allowing 
expression of the gene in Sac. erythraea or Streptomyces. The vector is then introduced into 
a polyketide producing microorganism. As a result of this vector construction, antisense 
messenger RNA (mRNA) is produced which interferes with the translation of the wild-type 
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mRNA. Similarly to the Type I manipulation, novel glycosylation modified polyketides will 
be produced in which the normal mycarose, desosamine, and/or closely related sugar residue 
is lacking or is substituted by a sugar intermediate. 

In a Type II change, inactivation of the eryB and/or eryC genes by antisense 

5 expression is accomplished by a two step procedure in which (1) a specific sugar biosynthesis 
gene is subcloned into an expression vector in an antisense (i.e. reverse) orientation; and (2) 
the anti-sense expression vector is introduced into the desired strain. The first step is 
accomplished using standard recombinant DNA techniques employing either E. coli or 
Streptomyces as the host, and an expression vector (capable of replicating in either host) that 

10 can be assembled to contain a Streptomyces promoter. Streptomyces promoters may be 
obtained from any commercially available Streptomyces plasmids or Streptomyces- E. coli 
shuttle plasmids. In step 2, the anti-sense expression vector is introduced into a suitable 
Streptomyces strain and the transformed cells are grown under selective conditions in order to 
maintain the expression palsmid in the cell. 

15 As described in Example 2, the gene to be inactivated is subcloned in its reverse 

orientation downstream of a Streptomyces promoter (which is contained within a replicating 
Sac. erythraea plasmid). The plasmid carrying the antisense gene is then introduced into the 
host strain by protoplast transformation. Transformants are regenerated under selective 
conditions in order to maintain the autonomously replicating plasmid in the cells. Subsequent 

20 expression of the antisense gene causes the production of an antisense messenger RNA 
(mRNA) that is complementary to the mRNA of the native allele of the selected gene. 
Through standard nucleotide base pair interactions, the antisense mRNA and the native 
mRNA form an RNA duplex that occludes the ribosome binding site of the native mRNA. 
This interaction prevents ribosomal translation of the native mRNA and the corresponding 

25 synthesis of the enzyme encoded by that mRNA. In this way, specific enzymatic steps in 
sugar biosynthesis corresponding to the identity of the gene expressed in the antisense 
orientation are blocked leading to the production of novel sugar intermediates which, when 
attached to the polyketide ring of the host microorganism, give rise to novel glycosylation- 
modified polyketides. Alternatively, the antisense expression vector can be constructed using 

30 a non-replicating Sac. erythraea vector that includes flanking DNA from a nonessential 

region of the Sac. erythraea chromosome, such as the region immediately upstream from the 
eryK gene (FIG. 1). This vector can then be used to stably insert the antisense construction 
into the chromosome by homologous recombination in a fashion similar to that described for 
the construction of a Type I alteration. 

35 In the third case (referred to herein as Type III alterations), novel glycosylation- 

modified polyketides of desired structure are produced by arranging all or a subset of the 
eryB and/or eryC genes on a replicating vector and introducing these genes en bloc into a 
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"distinct" polyketide-producing organism, ie. one other than the microorganism from which 
the eryB and/or eryC genes were taken. As an example, eryB and/or eryC genes may be 
taken from Sac. erythreae and introduced into Streptomyces violaceoniger or Streptomyces 
venezuelae. In this case, mycarose, desosamine, their biochemical intermediates and/or their 
5 closely related derivatives will be synthesized and attached at specific positions to polyketide 
compounds that do not necessarily carry these, or any, sugar residues. Some examples of 
novel glycosylated polyketides that may be produced in hosts that carry such manipulations 
are shown in FIG. 6. 

In Type III changes, the genes for the biosynthesis of mycarose and/or desosamine are 
io introduced into a polyketide-producing organism other than Sac. erythraea by another simple 
two step procedure: 1) all or a subset of the eryB and/or eryC genes are assembled together on 
a replicating plasmid downstream of a Streptomyces promoter; and 2) the plasmid is 
introduced into the polyketide-producing organism. Step 1 requires standard recombinant 
DNA manipulations employing E. coli and/or Streptomyces as the host. Step 2 requires one 
15 or more plasmids out of the several Streptomyces vectors or E. coli-Streptomyces shuttle 
vectors available, one or more promoters that function in Streptomyces y and a selection for 
the presence of the strain carrying the plasmid. As described in Examples 3 and 4, sets of the 
eryB and/or eryC genes are sequentially subcloned together on a replicating vector 
downstream of a suitable promoter that functions in the desired host. The plasmid carrying 
20 the grouped genes is then introduced into the host strain by electroporation or by 
transformation of protoplasts employing selection for a plasmid marker. 

GENERAL METHODS 

25 Materials. Plasmids. and Bacterial Strains 

Restriction endonucleases, T4 DNA ligase, competent E. coli DH5a cells, X-gal, 
IPTG and plasmids pUC18, pUC19, and pBR322 were purchased from Bethesda Research 
Laboratories (BRL), Gaithersburg, MD. VentR® DNA polymerase was purchased from New 

30 England Biolabs (Beverly, MA). Plasmids pGEM®5Zf, pGEM®7Zf, and pGEM®l lZf were 
from Promega, Madison, WI, plasmids pIJ4070 and pIJ702 were obtained from the John 
Innes Institute, Norwich, England, and plasmids pWHM3 and pWHM4 (/. BacterioL 1989 
171:5872) were obtained from C R. Hutchinson, University of Wisconsin, Madison, WI. 
[a- 32 P]dCTP, Hybond™-N nylon membranes, and Megaprime nick translation kits were 

35 from Amersham Corp., Chicago, IL. SeaKem® LE agarose and SeaPlaque® low gelling 
temperature agarose were from FMC Bioproducts, Rockland, ME. E. coli K12 strains 
carrying the E. coli-Sac. erythraea shuttle plasmids pWHM3 and pWHM4 (Vara et aU J 
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Bacteriol 171:5872 (1989)) and pAIX have been deposited at the Agricultural Research 
Culture Collection (NRRL) 1815 N. University Street, Peoria, Illinois 61604, as of 
December 5, 1995, under the terms of the Budapest Treaty and will be maintained for a 
period of thirty (30) years from the date of deposit, or for five (5) years after the last request 

5 for the deposit, or for the enforceable period of the U.S. patent, whichever is longer. 
Plasmids pWHM3, pWHM4 and pAIX were accorded the accession numbers NRRL B- 
21512, NRRL B-21513 and NRRL B-21514, respectively. Sac. erythraea strain NRRL2338 
is also available from the Agricultural Research Service culture collection. Staphylococcus 
aureus Th R (thiostrepton resistant) was obtained by plating 10 8 cells of 5. aureus on agar 

10 medium containing 10 Jig/ml thiostrepton and picking a survivor after 48 hr growth at 37°C. 
Thiostrepton was obtained from Sigma Chemical, St. Louis, MO. All other chemicals and 
reagents were from standard commercial sources unless otherwise specified. 

DNA Mampvl^iiQns 

15 Standard conditions were employed for restriction endonuclease digestion, agarose 

gel-electrophoresis, isolation of DNA fragments from low melting agarose gels, DNA 
ligation, plasmid isolation from E. coli by alkaline lysis, and transformation of E. coli 
employing selection for ampicillin resistance (150 Hg/ml) on LB agar plates (Sambrook et aL 
Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory Press, Plainview, 

20 NY, 1989). Total DNA from Sac. erythraea and Streptomyces species (including S.fradiae, 
S. celestes, S. violaceoniger, S. hygroscopicus, S. venezuelae) was prepared according to 
described procedures (Hopwood et aL, Genetic Manipulation of Streptomyces, A Laboratory 
Manual, John Innes Foundation, Norwich, UK (1985)). Transfer of DNA from agarose gels 
to Hybond™-N membranes and Southern analysis using Megaprime™ nick translated probes 

25 was performed according to the manufacturers instructions. 

Amplification of D NA Fragments 

Synthetic deoxyoligonucleotides were synthesized on an ABI Model 380A 
synthesizer (Applied Biosystems, Foster City, CA) following the manufacturers 

30 recommendations. Amplification of DNA fragments was performed by the polymerase chain 
reaction (PCR) using a Perkin Elmer GeneAmp® PCR System 9600. Reactions contained 
100 pmol of each primer, 1 \lg of template DNA (chromosomal DNA from Sac. erythraea 
NRRL2338), 2 units VentR® DNA polymerase in 100 [ll volume of PCR buffer (10 mM KC1, 
10 mM (NH4) 2 S0 4 , 20 mM Tris-HCl (pH 8.8, @ 25°C), 2.5 mM MgS0 4 , 0. 1% Triton® X- 

35 100) containing dATP (200 pM), dTTP (200 |iM), dCTP (250 yM), and dGTP (250 jiM). 
The reaction mixture was subjected to 30 cycles. Each cycle consisted of one period of 35 
sec at 96°C and one period of 2 min at 72°C. The reaction products were visualized and 
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purified from low melting agarose. The PCR primers described in the examples were derived 
from the nucleotide sequence of the eryB and eryC genes of FIG. 4. 

Treq$fcrm3tiQn an<* Ge nc Replacement in Sac, erythraea 

5 Protoplasts of Sac, erythraea strains were prepared and transformed with miniprep 

DNA isolated from £. coli according to published procedures (Yamamoto etal, J 
Antibiotics, 39: 1304 (1986)). Non-integrative transformants, in the case of pWHM4 
derivatives, were selected by regenerating the protoplasts and overlaying with thiostrepton 
(final concentration 20 jig/ml) as described (Weber et al, Gene, 68:173 (1988)). Integrative 

io transformants, in the case of pWHM3 derivatives, were selected on thiostrepton-containing 
agar plates (15 *ig/ml) as described by Weber et al, Gene, 68: 173 (1988). Loss of the Th R 
phenotype was monitored after two rounds of non-selective growth in SGGP media 
(Yamamoto et al, J Antibiotics, 39:1304 (1986)) followed by protoplasting and serial 
dilution on non-selective agar media. Regenerated protoplasts were replica plated on 

15 thiostrepton-containing media. Th s (thiostrepton-sensitive) colonies arose at a frequency of 
10* 1 . Retention of the mutant allele was established by Southern hybridization of several 
ThS colonies. 

FepnenUttiQini 

20 Sac. erythraea or Streptomyces cells are inoculated into 100 ml SCM medium (1 .5% 

soluble starch, 2.0% Difco Soytone, 0.15% Yeast Extract, 0.01% CaCl2) and allowed to grow 
for 3 to 6 days. The entire culture is then inoculated into 10 liters of fresh SCM medium. 
The fermenter is operated for a period of 4 to 7 days at 32°C maintaining constant aeration 
and pH at 7.0. After the fermentation is complete, the cells are removed by centrifugation at 

25 4°C and the fermentation beer is kept cold until further use. When antibiotic selection to 
maintain a plasmid, such as pXC4 or pXB6, is required, thiostrepton (10p.g/ml) is added to 
both the 100 ml starter culture and the 10-liter fermenter. 

The invention will be better understood in connection with the following examples, 
30 which are intended as an illustration of and not a limitation upon the scope of the invention. 
Both below and throughout the specification, it is intended that citations to the literature be 
expressly incorporated by reference. 

Example 1: Cons truction and characterization of Sac erythraea ERBIV that produces 
35 4"-deoxv-4 M - oxo-ervthromvcin A 



A. Construction of Plasmid pRBIV : A 4.3 kb Pstl-HindUl fragment, which included 
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the eryBIV gene, was isolated from the plasmid pAIX5 and subcloned into Pstl-Hindlll 
digested pUC19 to generate plasmid pUCBIV. After transformation and isolation of the 
plasmid from E. coli, the identity of pUCBIV was confirmed by digestion with Muni which 
released a fragment of 370 bp. Plasmid pUCBIV was then cut with the restriction enzyme 

5 Ncol, the restriction site filled in with Klenow enzyme, and the plasmid religated to generate 
plasmid pNCOBIV, (which now carried a frameshift mutation in the eryBIV gene). After 
transformation and isolation of the plasmid from E. coli> the identity of pNCOBIV was 
confirmed by digestion with Nsil and Hindlll which released a fragment of 1 .59 kb. (The 
Nsil site was formed by the fill-in and religation of the Ncol site.) Finally, plasmid 

10 pNCOBIV was digested with HindJE and Sstl and the 3.2 kb fragment carrying the altered 
eryBIV gene was isolated and ligated into HindUl and Sstl digested pWHM3 to generate 
plasmid pRBIV. After transformation and isolation of the plasmid from E. coli, the identity 
of pRBIV was confirmed by digestion with Kpnl which released fragments of 5.2 kb, 4.4 kb, 
and 0.72 kb. 

15 B. Construction of Sac, ervthraea ERBIV : Sac. erythraea protoplasts were 

transformed with plasmid pRBIV and integrative transformants selected as described in 
General Methods. Resolution of the integrants by nonselective growth as described in 
General Methods yielded Sac. erythraea ERBIV in which the wild type copy of the eryBIV 
gene was replaced with the inactive mutant copy. Gene replacement was confirmed by 

20 Southern analysis of Ncol digested Sac. erythraea DNA and Ncol-Nsil digested Sac. 

erythraea DNA using the 1 .58 kb Ncol-HindTH fragment isolated from plasmid pUCBIV 
(coordinates 681-2214, FIG. 4B) as a probe. Wild type Sac. erythraea and wild type 
resolvants display a hybridizing DNA fragment of 2.75 kb when digested with either Ncol or 
Ncol-NsiX whereas Sac. erythraea strain ERBIV is characterized by hybridization to either a 

25 16 kb DNA fragment or a 2.75 kb DNA fragment when digested with Ncol or Ncol-Nsil, 
respectively. 

C. Isolation, purification, and properties o f 4"-deoxv-4 ,, -oxo-erythromvcin A from 
Sac, ervthraea ERBIV : Sac. erythraea strain ERBIV is fermented for 4 days in SCM media 
as described in General Methods. The fermentation broth of Sac. erythraea ERBIV is then 

30 cooled to 4°C and adjusted to pH 4.0 and extracted once with methylene chloride. The 
aqueous layer is readjusted to pH 9.0 and extracted twice with methylene chloride and the 
combined basic methylene chloride extracts are concentrated to a solid residue. This is 
digested in methanol and chromatographed over a column of Sephadex LH-20 in methanol. 
Fractions are tested for bioactivity against a sensitive organism, such as Staphylococcus 

35 aureus Th R , and active fractions are combined. The combined fractions are concentrated and 
the residue is digested in 10 ml of the upper phase of a solvent system consisting of n- 
heptane, benzene, acetone, isopropanol, 0.05 M, pH 7.0 aqueous phosphate buffer 
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(5:10:3:2:5, v/v/v/v/v), and chromatographed on an Ito Coil Planet Centrifuge in the same 
system. Active fractions are combined, concentrated and partitioned between methylene 
chloride and dilute ammonium hydroxide (pH 9.0). The methylene chloride layer is 
separated and concentrated to yield the desired product as a white foam. 

Example 2: Construction and characterization of Sac, erv thraea ER720(pASBVID that 
produces 3-a-D-mvcarosvI-5-B-D'desosaminovl -12-hvdroxv-ervthronolide B 

A. Construction of plasmid pASX2 (see FIG. 7) : The 290 bp EcoRl-BamHl segment 
carrying the ermE* promoter is isolated from plasmid pIJ4070 and ligated into EcoRI-BamHl 
digested pWHM4 DNA to form pASXl . After transformation and isolation of the plasmid 
from E. colh the identity of pASXl is confirmed by digestion with ApalA which releases 
fragments of 3.9 kb, 2.5 kb, 1 .2 kb, 0.5 kb, and 0.4 kb. Two oligonucleotides of the 
sequences: SEQIDNO:31 (5-GATCCAGCGTCTGCAGGCATGCTCTAGATACAATTA 
AAGGCTCCmTTGGAGCCl^^^UU^ llGGAGATTTTCAACGT-S ) and 

SEQ ID NO:32 (5-AGCTACGTTGAAAATCTCCAAAAAAAAAGGCTCCAAAA 
GGAGCCTTTAATTGTATCTAGAGCATGCCTGCAGACGCTG-3'), corresponding to the 
(+) and (-) strands of the bacteriophage fd gene VIII transcription terminator (t-fd) (Beck et 
al (1978) Nucl. Acids Res. 5:4495])and including restriction enzyme sites for the enzymes 
Pstl, Sph\ y and Xbah and overhanging ends compatible with BamHI and HindUl are 
synthesized and approximately 250 ng of each oligonucleotide are then mixed together in TE 
buffer and heated to 99°C for 1 min. The solution is cooled slowly to room temperature 
allowing the oligonucleotides to anneal due to self complementarity, and the annealed 
oligonucleotides are then ligated into 5amHI-//mdIII digested pASXl to give pASX2. After 
transformation and isolation of the plasmid from E. colh the identity of pASX2 is confirmed 
by DNA sequencing of the 1.2 kb EcoRl-SatL fragment that contains the ErmE* promoter and 
the bacteriophage fd terminator. 

B. Construction of plasmid pASBVII (see FIG. 8) : The 598 base pair DNA segment 
that carries the eryBVII gene, comprising coordinates 7398-7996 (FIG. 4B), is amplified by 
PCR employing two oligonucleotides, SEQ ID NO:33 (5- 

GATCGCATGCTCTAGAGTACG-TGAGCTGGCGGTGGCGGGC-3 ) and SEQ ID NO:34 
(S'-GATCCGGATCCGCATGCTT-CACCTGCCGGTGCTGGCGGG-S'). After digestion of 
the purified PCR product with BamKhXbal the PCR fragment was ligated to BamVLl-Xbal 
digested pASX2 to give pASB VII. After transformation and isolation of the plasmid from E. 
coli, the identity of pASB VII is verified by DNA sequencing of the 880 bp EcoW-Xbal 
insert. 

C Construction of Sac, ervthraea ER720( pASBVm: Sac. erythraea strain ER720 
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protoplasts are transformed with plasmid pASBVII and transformants are selected for with 
thiostrepton (15 M-g/ml). To confirm transformation, total DNA is isolated from Th R colonies 
and used to transform E. coll After transformation and isolation of the plasmid from E. coli, 
the identity of pASBVII is verified by restriction analysis with the enzymes Pvull and BamHl 
which releases a 1 .48 kb fragment. Those Sac. erythraea colonies that are found to contain 
pASBVII are designated Sac. erythraea ER720(pASB VII). 

D. Isolation, purification, and properties of 3-a-D-mvcarosvl-5-B-D-desosaniinovl- 
12-hvdroxv-ervthronolide B from Sac, ervthraea E R720fpASBVm: Sac. erythraea 
ER720(pASBVII) is fermented for 3 days in SCM media with thiostrepton selection as 
described in General Methods. The fermentation broth is then cooled to 4°C and adjusted to 
pH 4.0 and extracted once with methylene chloride. The aqueous layer is readjusted to pH 
9.0 and extracted twice with methylene chloride and the combined extracts are concentrated 
to a solid residue. This is digested in methanol and chromatographed over a column of 
Sephadex LH-20 in methanol. Fractions are tested for bioactivity against a sensitive 
organism, such as Staphylococcus aureus Th R , and active fractions are combined. The 
combined fractions are concentrated and the residue is digested in 10 ml of the upper phase of 
a solvent system consisting of n-heptane, benzene, acetone, isopropanol, 0.05 M, pH 7.0 
aqueous phosphate buffer (5:10:3:2:5, v/v/v/v/v), and chromatographed on an Ito Coil Planet 
Centrifuge in the same system. Active fractions are combined, concentrated and partitioned 
between methylene chloride and dilute ammonium hydroxide (pH 9.0). The methylene 
chloride layer is separated and concentrated to yield the desired product as a white foam. 

Example 3: Construction and characterization of Str eptomvces antihioticus ATCC 
1 189UpXB6^ that produces 3-des-oleandr osvl-3-mvcarosvl oleandomycin 

A. Construction of plasmid pKB6 and intermediates (see FIG, 9) 

i) Construction of plasmid pKl : The DNA sequences of pBR322 (GenBank 
Accession #: J01749) and pUC19 (GenBank Accession #: X02514) are known. The 805 nt 
DNA segment comprising coordinates 1673 through 2478 of pBR322 is amplified by PCR 
employing two oligodeoxynucleotides, SEQ ID NO:35 (5-GATCACATGTTCTTTCCTG- 
CGTTATCCCCTG-3*) and SEQ ID NO:36 (S'-GATCGGATCCATGCATGTCTAGAGCA- 
TCGCAGGATGCTGCTGGC-3'). After digestion of the purified PCR product with AfUll 
and BamHl the fragment is ligated into Aftm and BamHl digested pUC19 to give plasmid 
pKl. The identity of plasmid pKl, after transformation and isolation from£. coli, is verified 
by Pvull digestion which releases fragments of 0.55 kb and 2.55 kb. Plasmid pKl contains 
the ROP region of pBR322 that controls plasmid copy number. 

ii) Construction of plasmid pKB 1 : The 2.24 kb DNA segment that carries the 
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eryBIV and eryBV genes, comprised between coordinates 56 and 2296 of the sequence 
presented in SEQ ID NO:2, is amplified by PCR employing two deoxyoligonucleotides, 
SEQ ID NO:37 (S'-GAATGCATCCTGGAAAGCGAGCAAATGCTCCGGTG-S 1 ) and SEQ 
ID NO:38 (5-GATCTAGAGCTAGCCGGCGTGGCGGCGCGTG-3). After digestion with 
Nsil and Xbal the fragment is ligated into Nsil and Xbal digested pKl to yield plasmid pKB 1 , 
5.3 kb in size. The identity of plasmid pKB 1 , after transformation and isolation from E. coli, 
is verified by Kpnl digestion which releases fragments of 0.72 kb, 1 . 14 kb and 3.42 kb. 

iii) Construction of plasmid pKB2 : The 1 .56 kb DN A segment that carries 
the eryBVl gene, comprised between coordinates 3121 and 4677 of the sequence presented in 
SEQ ID NO:2, is amplified by PCR employing two deoxyoligonucleotides, SEQ ID NO:39 
(S'-GATCGCTAGCCGTGACCGGACCCTTACAGTGAGTGO') and SEQ ID NO:40 
(S'-GATCTAGACTTAAGTCATCCGGCGGTCCTGGTGTAGACGGCO'). After digestion 
with Nhel and Xbal the fragment is ligated into Nhel and Xbal digested pKBl to give plasmid 
pKB2, 6.9 kb in size. The identity of plasmid pKB2, after transformation and isolation from 
E. coli, is confirmed by BamHl digestion which releases fragments of 0.22 kb, 0.40 kb, 2.6 
kb and 3.7 kb. 

iv) Construction of plasmid pKB3 : The 0.6 kb DNA segment that carries the 
ery BVII gene, comprised between coordinates 7385 and 7987 of the sequence presented in 
SEQ ID NO:2, is amplified by PCR employing two deoxyoligonucleotides, SEQ ID NO:41 
(5 -GATCTTAAGAACCGGAGTTGCGAGTACGTGAGCTGGCG-3 ) and SEQ ID NO:42 
(S'-GATCTAGACCTAGGTCACCTGCCGGTGCTGGCGGGCTC-S ). After digestion with 
AfUl and Xbal the fragment is ligated into AfUl and Xbal digested pKB2 giving plasmid 
pKB3, 7.5 kb in size. The identity of plasmid pKB3, after transformation and isolation from 
E. coli, is verified by Pstl digestion which releases fragments of LI kb and 6.4 kb. 

v) Construction of plasmid dKB4 : The 1 .0 kb DNA segment that carries the 
eryBII gene, comprised between coordinates 2385 and 3410 of the sequence presented in 
SEQ ID NO:l, is amplified by PCR employing two deoxyoligonucleotides, SEQ ID NO:43 
(5-GATCCTAGGCCGCAGGAAGGAGAGAACCACG-3 ) and SEQ ID NO:44 
(5-GATCTAGATTAATCACrGCAACCAGGCTTCCGGC-3'). Following digestion with 
v4vrll and Xbal the fragment is ligated into Avrll and Xbal digested pKB3 yielding the desired 
plasmid pKB4. After transformation and isolation of the plasmid from E. coli t the identity of 
pKB4, 8.5 kb in size, is verified by Bglll and EcoKL digestion which releases fragments of 
0.41 kb, 1.6 kb, 3.1 kb and 3.4 kb. 

vi) Construction of plasmid pKB5 : The DNA sequence of eryBUl has been 
reported (Haydock et al ( 1 99 1 ) Mol Gen Genet 230: 1 20). The 1 .3 kb DNA segment that 
carries the eryBIII gene, comprised between coordinates 3965 and 5232 of the sequence 
depicted in Haydock et al is amplified by PCR employing two deoxyoligonucleotides, SEQ 
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ID NO:45 (5-GATTAATTGGCCGCGGCGCCGCGCTC-GTTATG-3) and SEQ ID NO:46 
(5'-GATCTAGATAATTAATCATACGACTTCCAGTC-GGGGTAG-3'). After digestion 
with Msel and Xbal the fragment is ligated into Msel and Xbal digested pKB4 to give the 
desired plasmid pKB5, 9.8 kb in size. The identity of pKB5, after transformation and 
isolation from E. coli, is verified by Pstl digestion which releases fragments of 1.1 kb, 2.5 kb, 
and 6.1 kb, visualized by gel electrophoresis. 

vii) Construction of plasmid pKB6 : The eryBI gene has been mapped 
(Hay dock et al ( 1 99 1 ) Mol Gen Genet 230: 1 20) and the DN A sequence on both flanks of 
eryBI is known (Haydock et al (1991) Mol Gen Genet 230:120) and GenBank Accession # 
Ml 1200. The 2.5 kb DNA segment that carries the eryBI gene, comprised between 
coordinates 1 . 1 and 3.6 of the map presented in Haydock et al., is amplified by PCR 
employing two deoxyoligonucleotides: SEQ ID NO:47 (5'-GATTAATTAATGATCA- 
AGCTGAAAATTGTTTGCATG-3') and SEQ ID NO:48 (5 '-G ATCTAG ACTGCCGGCT- 
CAGCCTTCCC AGGTTCG-3') . After digestion with Pad and Xbal the fragment is ligated 
into Pad and Xbal digested pKB5 to give plasmid pKB6, 12.3 kb in size. The identity of 
pKB6, after transformation and isolation from E. coli, is verified by SamHI digestion which 
releases fragments of 0.22 kb, 0.40 kb, 1 .4 kb, 2.6 kb, 3.3 kb and 4.4 kb. Plasmid pKB6 
carries all of the eryB genes, eryBI-eryBVIl, that are involved in the biosynthesis of mycarose 
and its attachment to the polyketide. 

B. Construction of Plasmid nXSB6 (see FIG. 1 H: The 9.2 kb Nsil-Xbal segment of 
pKB6, prepared as described in Example 3(A)(vii) above, that carries all of the eryB genes is 
isolated and ligated into Pstl-Xbal digested pASX2, prepared as described in Example 2(A) 
above, to give plasmid pXSB6. After transformation and isolation of the plasmid from E. 
coli, the identity of pXSB6, 17.2 kb in size, is verified by the observation of fragments of 
0.4 1 kb, 1 .9 kb, and 14.9 kb after EcoRl digestion. Plasmid pXSB6 carries all of the eryB 
genes in a transcriptional fusion downstream of the ermE* promoter on an E. coli- 
Streptomyces shuttle plasmid. 

C. Construction of Plasmid pXB6 

i) Construction of plasmid pN7 02 Csee FIG. 10V. Two oligonucleotides of the 
sequences: SEQ ID NO:49 5'-GGAATTCAGATCTATGCATTCTAGAA-3') and 
SEQ ID NO:50 (5-CGCGTTCTAGAATGCATAGATCTGAATTCCTGCA-3) that include 
restriction enzyme sites for the enzymes EcoRI, Bgltt, Nsil, and Xbal and overhanging ends 
compatible with Pstl and MM are synthesized. Approximately 250 ng of each 
oligonucleotide are then mixed together in TE buffer and heated to 99"C for 1 min. After the 
solution is cooled slowly to room temperature allowing the oligonucleotides to anneal due to 
self complementarity, the annealed oligonucleotides are ligated into Pstl-Mlul digested 
pIJ702 to yield the desired plasmid pN702. After transformation and isolation of the plasmid 
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from Streptomyces lividans 1326, the identity of plasmid pN702, 4.3 kb in size, is verified by 
the observation of fragments of 0.75 kb and 3.6 kb after EcoRl-BamHl or Xbal-BamHl 
digestion. 

ii) Construction of plasmid pXl (see FIG. 101 : The 290 bp EcoRI-BamHl 
5 segment that carries the ermE* promoter is isolated from plasmid pIJ4070 and ligated into 
EcoRl-Bglll digested pN702 to give plasmid pXl. The resulting mixture contains the desired 
plasmid pXl . After transformation and isolation of the plasmid from Streptomyces lividans 
1326, the identity of plasmid pXl, 4.6 kb in size, is verified by the observation of fragments 
of 1 .0 kb and 3.6 kb after Nsil-BamHl digestion. 
io iii) Construction of p)a?mid pXP 6 (see FI G, 1 1 ) : The 9.2 kb Nsil-Xbal 

segment of pKB6, prepared as described in Example 3(A)(vii) above, that carries all of the 
eryB genes is isolated and ligated into Nsil-Xbal digested pXl to give the desired plasmid 
pXB6. After transformation and isolation of the plasmid from Streptomyces lividans 1326, 
the identity of plasmid pXB6, 13.8 kb in size, is verified by the observation of fragments of 
15 0.41 kb, 1.9 kb, and 1 1.5 kb after EcoRl digestion. Plasmid pXB6 carries all of the eryB 
genes in a transcriptional fusion to the ermE* promoter on a Streptomyces plasmid. 

D, Construction of Streptomy ces antibioticus ATCC 1 189KpXB6): Approximately 
500 jig of plasmid pXB6, isolated from Streptomyces lividans 1326(pXB6), are 
electroporated into the oleandomycin producer Streptomyces antibioticus ATCC 1 1891 and 

20 several of the resulting Thio^ colonies that appear on the R3M-agar plates containing 

thiostrepton are analyzed for their plasmid content. The presence of plasmid pXB6, 13.8 kb 
in size, is verified by the observation of fragments of 0.41 kb, 1 .9 kb, and 1 1 .5 kb after EcoTU 
digestion. 

E. Isolation, pu rification, and properties of 3-des-oleandrosvl-3-mvcarosvl 
25 oleandomycin from Streptomyces antibioticus ATCC 1 189KpXB6): Streptomyces 

antibioticus ATCC 1 1891(pXB6) is fermented for 5 days in SCM media with thiostrepton 
selection as described in General Methods. The fermentation broth is then cooled to 4°C and 
adjusted to pH 4.0 and extracted once with methylene chloride. The aqueous layer is 
readjusted to pH 9.0 and extracted twice with methylene chloride and the combined extracts 

30 are concentrated to a solid residue. This is digested in methanol and chromatographed over a 
column of Sephadex LH-20 in methanol. Fractions are tested for bioactivity against a 
sensitive organism, such as Staphylococcus aureus Th^, and active fractions are combined. 
The combined fractions are concentrated and the residue is digested in 10 ml of the upper 
phase of a solvent system consisting of n-heptane, benzene, acetone, isoprop&nol, 0.05 M, pH 

35 7.0 aqueous phosphate buffer (5: 10:3:2:5, v/v/v/v/v), and chromatographed on an Ito Coil 
Planet Centrifuge in the same system. Closely eluting active fractions are combined, 
concentrated and partitioned between methylene chloride and dilute ammonium hydroxide 
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(pH 9.0). The methylene chloride layer is separated and concentrated to yield the desired 
product as a white foam. 

Fxample 4: Construction and characterizatio n of Strentomvces violaceoniyer NRRL 
2834fpXC4) that produces 5-des-chalc nsvl-5-desosaminovl lankamvcin 

A. Construction of plasmid yV-CA and intermediates (see FIG- 12) 

i) Construction of plasmid pKCI : The 2.4 kb DNA segment that carries the 
eryCII and eryCIIl genes, comprised between coordinates 33 and 2413 of the sequence 
presented in SEQ ID NO:l, is amplified by PCR employing two deoxyoligonucleotides, 
SEQ ID NO:51 (5'-GAATGCATCTGGCTGGGCGGAGGGAATTCATG-3) and 

SEQ ID NO:52 (5'-GATCTAGACITAAGTCATCGTGGTTCTCTCCTTCCTGC 
GGC-3). After digestion with AMI and Xbal the purified PCR fragment is ligated into Nsil 
and Xbal digested pKl to give plasmid pKCI, 5.5 kb in size. The identity of plasmid pKCI, 
after transformation and isolation from E. coli, is verified by EcoRI digestion which releases 

fragments of 2.2 kb and 3.3 kb. 

ii) Construction of plasmid dKC2 : The 732 bp DNA segment that carries the 
eryCVI gene, comprised between coordinates 2331 and 3063 of the sequence presented in 
SEQ ID NO:2, is amplified by PCR employing two deoxyoligonucleotides, 

SEQ ID NO:53 (5'-GATCCTTAAGCTCCGGAGGGAGCAGGGATG-3') and 
SEQ ID NO:54 (5'-GATCTAGACCTAGGTCATCCGCGCACACCGACGAAC-3'). After 
digestion with A/m and Xbal the purified PCR fragment is ligated into Aflll and Xbal 
digested pKCI to give plasmid pKC2, 6.2 kb in size. The identity of plasmid pKC2, after 
transformation and isolation from E. coli, is verified by Xbal-EcoRl digestion which releases 

fragments of 0.95 kb, 2.2 kb and 3. 1 kb. 

iii) Construction of plasmid nKC3 : The 2.7 kb DNA segment that carries the 
eryCIV and eryCV genes, comprised between coordinates 4650 and 7386 of the sequence 
presented in SEQ ID NO:2, is amplified by PCR employing two deoxyoligonucleotides, 
SEQ ID NO:55 (5 -GATCCTAGGCCGTCTACACCAGGACCGCCGG-3 ) and 

SEQ ID NO:56 (5'-GATCTAGATTAATCACCTTCCGCGCAGGAAGCCGC-3')- After 
digestion with Avrll and Xbal the purified PCR fragment is ligated into AvrU and Xbal 
digested P KC2 to yield plasmid pKC3, 9.0 kb in size. The identity of plasmid pKC3, after 
transformation and isolation from E. coli, is verified by Sphl digestion which releases 

fragments of 4.0 kb and 5.0 kb. 

iv) Construction of plasmid pKC4 : The DNA sequence of the eryCI gene has 
been determined (GenBank Accession #X15541). The 1.1 kb DNA segment that carries the 
eryCI gene, comprised between coordinates 38 and 1 161 of the sequence indicated above, is 
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amplified by PCR employing two deoxyoligonucleotides, SEQ ID NO:57 (5-GATCTTAAG- 
CCGCC ACTCG A ACGG AC ACTCG-3 ) and SEQ ID NO:58 (5-GATCTAGATCAAGCCC- 
C AGCCTTG AGGG- 3 ') . After digestion with Msel and Xbal the fragment is ligated into 
Msel and Xbal digested pKC3 to give plasmid pKC4, 10.1 kb in size. The identity of plasmid 
pKC4, after transformation and isolation from E. coli, is verified by Kpnl digestion which 
releases fragments of 0.15 kb t 0.31 kb, 4.1 kb and 5.5 kb. Plasmid pKC4 carries all of the 
eryC genes, eryCI-eryCVI, that are involved in the biosynthesis of desosamine and its 
attachment to the polydetide. 

B. Construction of Plasmid pXSC4 (see FIG. 13) : The 6.9 kb Nsil-Xbal segment of 
pKC4 that carries all of the eryC genes is isolated and ligated into Pstl-Xbal digested pASX2, 
prepared as described in Example 2(A), to give the desired plasmid pXSC4, 14.9 kb in size, 
wherein all of the eryC genes are transcriptionally linked downstream of the ermE* promoter 
on an E. coli-Streptomyces shuttle plasmid. The identity of plasmid pXSC4, after 
transformation and isolation from £. coli, is verified by the observation of fragments of 0.29 
kb, 2.2 kb, and 12.4 kb after EcoRl digestion . 

C. Construction of Plasmid P XC4 (see FIG. 13V The 6.9 kb Nsil-Xbal segment of 
pKC4 that carries all of the eryC genes is isolated and ligated into Nsil-Xbal digested pXl , 
prepared as described in Example 3(C)(ii), to give the desired plasmid pXC4, 1 1.5 kb in size, 
wherein all of the eryC genes are transcriptionally linked downstream of the ermE* promoter 
on a Streptomyces plasmid. After transformation and isolation of the plasmid from 
Streptomyces lividans 1326, the identity of plasmid pXC4 is verified by the observation of 
fragments of 0.29 kb, 2.2 kb, and 9.0 kb after EcoRl digestion. 

D. Construction of Streptomyces violaceon iger NRRL 2834(pXC4): Approximately 
500 ng of the plasmid pXC4, isolated from Streptomyces lividans 1326(pXC4) , are 
electroporated into the lank amy cin producer Streptomyces violaceoniger NRRL 2834 and 
several of the resulting Thio R colonies that appear on the R3M-agar plates containing 
thiostrepton are analyzed for their plasmid content. The presence of plasmid pXC4 is verified 
by the observation of fragments of 0.29 kb, 2.2 kb, and 9. 1 kb in size after EcoRl digestion 
of the plasmid. 

E. Isolation, purification, and properties of 5-des-c halcosvl-5-desosaminovl 
lankamvcin : 5. violaceoniger NRRL 2834(pXC4) is fermented for 5 days in SCM media 
with thiostrepton selection as described in General Methods. The fermentation broth is then 
cooled to 4°C and adjusted to pH 4.0 and extracted once with methylene chloride. The 
aqueous layer is readjusted to pH 9.0 and extracted twice with methylene chloride and the 
combined extracts are concentrated to a solid residue. This is digested in methanol and 
chromatographed over a column of Sephadex LH-20 in methanol. Fractions are tested for 
bioactivity against a sensitive organism, such as Staphylococcus aureus Th R , and active 
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fractions are combined. The combined fractions are concentrated and the residue is digested 
in 10 ml of the upper phase of a solvent system consisting of n-heptane, benzene, acetone, 
isopropanol, 0.05 M, pH 7.0 aqueous phosphate buffer (5:10:3:2:5, v/v/v/v/v), and 
chromatographed on an Ito Coil Planet Centrifuge in the same system. Active fractions are 
combined, concentrated and partitioned between methylene chloride and dilute ammonium 
hydroxide (pH 9.0). The methylene chloride layer is separated and concentrated to yield the 
desired product as a white foam. 

Although the present invention is illustrated in the examples listed above in terms of 
preferred embodiments, these examples are not to be regarded as limiting the scope of the 
invention. The above illustrations serve to describe the principles and methodologies 
involved in creating the types of genetic alterations that can be introduced into Sac. erythraea 
and/or other Streptomyces that result in the synthesis of novel glycosylation-modified 
polyketide products. Although a single Type I alteration, leading to the production of for 
example, 4 M -deoxy-4 H -oxo-erythromycin A, is specified herein, it is obvious to those skilled 
in the art that other Type I changes can be introduced into the eryB and/or eryC genes leading 
to novel glycosylation-modified polyketide structures. Examples of additional Type I 
alterations leading to useful novel compounds include but are not limited to: mutations in the 
eryB VII gene conceivably leading to 3-a-D-mycarosyl-5-B-D-desosaminoyl-12-hydroxy- 
erythronolide B and mutations in the eryCVI gene conceivably leading to N-3ct'-des-dimethyl 
erythromycin A. Moreover, it is obvious that Type I alterations in two or more different eryB 
and/or eryC genes can be combined leading to novel glycosylation-modified polyketide 
structures. Examples of combinations of two Type I alterations leading to useful compounds 
include but are not limited to: mutations in the eryBIV and eryBVII genes conceivably leading 
to S-a-D^^deoxy^^oxo-mycarosyl-S-fi-D-desosaminoyl- 1 2-hydroxy-erythronolide B; 
mutations in the eryBIV and eryCVI genes conceivably leading to ^'-deoxy-^-oxo-CN-Sa- 
des-dimethyl)-erythromycin A; and mutations in the eryBIV, eryBVII , and eryCVI genes 
conceivably leading to 3-a-D-4 M -deoxy-4"-oxo-mycarosyl-5-B-D-(N-3a-des-dimethyl)- 
desosaminoyl-1 2-hydroxy-erythronolide B. All Type I mutations or combinations of two or 
more Type I mutations in the eryBll eryBIV, eryBV, eryBVl eryBVII eryCIl eryCIll 
eryCIV t eryCV, or eryCVI genes, the Sac. erythraea strains that carry said mutations or 
combinations of mutations, and the corresponding polyketides produced from said strains, 
therefore, are included within the scope of the present invention. 

Although the Type II mutation specified herein was constructed with the eryBVII gene 
on a self-replicating plasmid it is obvious that other eryB genes and eryC genes can be 
expressed in an antisense orientation leading to novel glycosylation-modified polyketide 
structures. Examples of additional Type II alterations leading to useful compounds include 
but are not limited to: antisense expression of the eryBIV gene conceivably leading to 4"- 
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deoxy-4"-oxo-erythromycin A and antisense expression of the eryCVI gene conceivably 
leading to N-3a'-des-dimethyl erythromycin A. Moreover, it will occur to those skilled in the 
art that promoters other than the ermE* promoter, for example the melC promoter of pIJ702, 
will be suitable for antisense expression, and that many self-replicating vectors in addition to 
pWHM4 will function to carry the antisense alteration. It will also occur to those skilled in 
the art that a self-replicating vector is not required for this invention and that the antisense 
alteration can be introduced directly into the chromosome using the same principles 
employed to construct a Type I gene alteration. An example of a Type II alteration that is 
introduced directly into the chromosome is the eryBVII antisense alteration described in 
Example 2 wherein DNA segments immediately upstream of the eryK gene are used to flank 
the ermE-eryBVII-phage fd terminator grouping in a pWHM3 vector, and this vector is 
integrated into and then resolved from the chromosome leaving the ermE *-eryB V77-phage fd 
terminator grouping stably incorporated into this nonessential region of the chromosome of 
Sac. erythraea conceivably leading to the production of 3-a-D-mycarosyl-5-B-D- 
desosaminoyl-12-hydroxy-erythronolide B. All Type II mutations in the eryBIl eryBIV, 
eryBV, eryBVl eryBVII eryCll eryCIIl, eryCIV> eryCV, or eryCVI genes whether carried on 
a self-replicating plasmid or integrated into a nonessential region of the chromosome, the Sac, 
erythraea strains that carry said mutations, and the corresponding polyketides produced from 
said strains, therefore, are included within the scope of the present invention. 

Although Type III alterations, leading to the production of 5-des-chalcosyl-5- 
desosaminoyl lankamycin in Streptomyces violaceoniger and 3-des-oleandrosyl-3-mycarosyl 
oleandomycin in Streptomyces antibioticus, are specified herein, it is obvious that Type III 
alterations can be introduced into any polyketide producing microorganism leading to novel 
glycosylation modified polyketides. It will also occur to those skilled in the art that both the 
eryB and eryC genes can either be cotransformed into a polyketide producing microorganism 
or grouped together on a single vector that is introduced into a polyketide producing 
microorganism. An example of a Type III change using both the eryB and eryC genes 
together is their introduction into Streptomyces violaceoniger conceivably leading to 3-des- 
(4 ,, -0-acetylarcanosyl)-3-mycarosyl-5-des-chalcosyl-5-desosaminoyl lankamycin. Although 
the Type III alterations specified herein have indicated a specific genetic order of the eryB or 
eryC genes, it will occur to those skilled at the art that many different genetic arrangements of 
the eryB or eryC genes will produce similar results. It will also that occur to those skilled at 
the art that certain arrangements of the eryB and/or eryC genes that lack one or more of the 
respective eryB and/or eryC genes will lead to the production of novel glycosylated 
polyketides in which intermediate compounds in the biosynthesis of mycarose and/or 
desosamine, respectively, such as those outlined in FIGS. 2 and 3, are attached to the 
polyketide. An example of a Type III alteration in which only a subset of the eryB and/or 
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eryC genes are used is the introduction of a pXC4 derivative that lacks the eryCVI gene, 
removed by digestion of plasmid pXC4 with AfUl and Avrll followed by treatment with the 
Klenow fragment of DNA polymerase I and religation, into Streptomyces violaceoniger 
leading to the production of to 5-des-chalcosyl-5-(N-3a-des-dimethyl desosaminoyl) 

5 lankamycin. It will also that occur to those skilled at the art that promoters other than 
ermE or ermE*> such as the melC promoter of plasmid pIJ702, and vectors other than 
pWHM4 or pIJ702 can also be utilized in the construction of a Type III alteration, and these 
variants are, of course, considered to be within the scope of the invention. Finally, it will also 
occur to those skilled in the art that a self-replicating vector is not required for this invention 

to and that an assembly of sugar biosynthesis genes can be introduced directly into the 

chromosome of a heterologous host using the same principles employed to construct a Type I 
gene alteration once a nonessential region of the heterologous host chromosome has been 
identified. Alternatively, plasmids or bacteriophages which undergo site-specific 
recombination with host genes may also be used to introduce eryB and eryC genes into a host 

is to effect Type III alterations. All Type III alterations using one or more of the eryBII, 
eryBIV, eryBV, eryBVI, eryBVII, eryCU, eryClll, eryCIV, eryCV, or eryCVI genes, the 
polyketide producing strains that carry said alterations, and the corresponding polyketides 
produced from said strains, therefore, are included within the scope of the present invention. 
In addition, it is also possible to create combinations of Type I and Type II alterations 

20 such that some Type I eryB and/or eryC mutations are introduced directly into the Sac. 
erythraea chromosome in the appropriate locus, while other eryB and/or eryC genes are 
inactivated by Type II alterations using a self-replicating or integrating vector. For example, 
combination of a Type I alteration, such as a mutation in eryBIV, and a Type II alteration, 
such as transformation with pASBV/7, will conceivably lead to production of 3-a-D-4 u - 

25 deoxy-4"-oxo-mycarosyl-5-B-D-desosaminoyl- 1 2-hy droxy-ery thronolide B . All 

combinations of two or more alterations of Type I and Type II, the Sac. erythraea strains that 
carry such alterations, and the glycosylated polyketides produced from such strains are 
included within the scope of the present invention. 

As an extension of the examples reported with the eryB and/or eryC genes, it is 

30 possible to apply the method described herein to heterologous sugar biosynthesis genes that 
are similar to the eryB and/or eryC genes. The construction of strains carrying heterologous 
sugar biosynthesis genes that lead to the production of novel glycosylated polyketides 
requires: (i) cloning of the sugar biosynthesis genes from any other glycosylated-polyketide 
producing actinomycete, (ii) determining the nucleotide sequence of the cloned gene(s); (iii) 

35 excising and assembling the cloned gene(s) into vectors suitable for Type I, Type II, or Type 
m alterations; and (iv) transformation of polyketide producing microorganisms and screening 
for the novel compound. Any polyketide-associated sugar biosynthesis gene can thus be 
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precisely excised from the genome of a glycosylated polyketide producing microorganism 
and altered or arranged with other sugar biosynthesis genes and then introduced into the same 
or another polyketide producing microorganism to create a novel glycosylated polyketide of 
predicted structure. Thus, for example, a Type I or Type II alteration of a heterologous gene 

5 that is similar to an eryB and/or eryC gene, such as can be found in the eryBVII homolog for 
the synthesis of L-oleandrose in Streptomyces antihioticus, to result in the production of 3- 
des-L-oleandrosyl-3-D-oleandrosyl oleandomycin is included within the scope of the present 
invention. Similarly, a Type IE assembly of the genes for the synthesis of a sugar other than 
mycarose or desosamine, such as can be found in the genes for the synthesis of angolosamine 

io in Streptomyces eurythermus, and their transformation into Sac. erythraea to result in the 
synthesis of 5-des-desosaminoyl-5-angolosaminoyl-erythromycin A is included within the 
scope of the present invention. 

It will occur to those skilled in the art that the Type I, Type II, and Type III genetic 
manipulations described herein and the polyketide producing microorganisms into which they 

15 are introduced are in no way exclusive. Hence, the choice of a convenient host and the 
choice of a Type I, Type II, or Type III alteration is based solely on the relatedness of the 
desired novel glycosylated polyketide to a natural counterpart. Therefore, Type I, Type II, 
and Type III alterations can be constructed in any polyketide producing microorganism 
employing either endogenous or exogenous sugar biosynthesis genes. Thus all Type I, Type 

20 II, and Type III mutations or various combinations thereof constructed in any polyketide 
producing microorganism according to the principles described herein, and the respective 
polyketides produced from such strains, are included within the scope of the present 
invention. Examples of glycosylated polyketides that can be altered by creating Type I, Type 
II, or Type III changes in the producing microorganisms include, but are not limited to 

25 macrolide antibiotics such as erythromycin, tylosin, spiramycin, etc; aromatic polyketides 
such as daunorubicin and doxorubicin, etc; polyenes such as candicidin, amphotericins, etc; 
and other complex polyketides such as avermectin. 

Whereas the novel derivatives or modifications of erythromycin described herein have 
been specified as the A derivatives, such as 4"-deoxy-4"-oxo-erythromycin A, those skilled in 

30 the art understand that the wild type strain of Sac. erythraea produces a family of 

erythromycin compounds, including erythromycin A, erythromycin B, erythromycin C, and 
erythromycin D. Thus, modified strains of Sac. erythraea, such as strain ERBIV, for 
example, would be expected to produce the corresponding members of the 4"-deoxy-4"-oxo- 
erythromycin family, including 4"-deoxy-4"-oxo-erythromycin A, 4 "-deoxy-4 '-oxo- 

35 erythromycin B, 4 M -deoxy-4"-oxo-erythromycin C, and 4"-deoxy-4"-oxo-erythromycin D. 
Similarly, all other modified strains of Sac. erythraea that produce novel glycosylated 
erythromycin derivatives would be expected to produce the A, B, C, and D forms of said 
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derivatives. For example, modified Sac. erythraea strains that produce 6-deoxyerythromycin, 
6,12-dideoxyerythromycin and 6,7-anhydroerythromycin would be expected to produce novel 
glycosylation-modified polyketides by introduction of the additional modification of a Type 
I, II or III change in a sugar biosynthesis gene. Therefore, all members of the family of each 
5 of the novel erythromycins described herein or produced by these methods are included 
within the scope of the present invention. 

Variations and modifications of the methods for obtaining the desired plasmids, hosts 
for cloning and choices of vectors and eryB and/or eryC genes to clone and modify, other 
than those described herein will occur to those skilled in the art. For example, although we 
10 have described the use of plasmids pWHM3, pWHM4, and pIJ702, other vectors can be 
employed wherein all or part of said plasmids is replaced by other DNA segments that 
function in a similar manner, such as replacing the pUC19 component of pWHM3 and 
pWHM4 with pBR322, available from BRL; or employing different segments of the pUlOl 
replicon in pWHM3 and pIJ702, or the pJVl replicon in pWHM4, respectively; or employing 
15 selectable markers other than thiostrepton- or ampicillin-resistance. These are just a few of a 
long list of possible examples all of which are included within the scope of the present 
invention. Similarly, the segments of the eryB and eryC loci that have been specified herein 
to generate the various Type I, Type II, and Type III alterations can readily be substituted for 
other segments of different length encoding the same functions, either produced by PCR- 
20 amplification of genomic DNA or of an isolated clone, or by isolating suitable restriction 
fragments from Sac. erythraea. In the same way it is possible to create Type I mutations 
functionally equivalent to those described herein by altering through deletion, insertion, or 
site directed mutagenesis different portions of the corresponding genes. It is also possible to 
create Type II mutations functionally equivalent to those described herein by employing 
25 larger or smaller portions of the corresponding genes; and it is possible to create Type III 
mutations using larger or smaller segments of the corresponding genes in the same or 
different linear order described herein. Additional modifications include changes in the 
restriction sites used for cloning or in the general methodologies described above. All such 
changes are included in the scope of the present invention. It will also occur to those skilled 
30 in the art that different methods are available to ferment Sac. erythraea and other polyketide 
producing microorganisms and to extract the novel polyketides specified herein, and all such 
methods are also included within the scope of this invention. 

It will also be apparent that many modifications and variations of the invention as set 
forth herein are possible without departing from the spirit and scope thereof, and that, 
35 accordingly, such limitations are imposed only as indicated by the appended claims. 
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We claim: 

1 . An isolated single or double stranded polynucleotide having a nucleotide sequence 
which comprises (a) a nucleotide sequence selected from the group consisting of (i) the 
sense sequence of SEQ ID NO:l from about nucleotide position 54 to about nucleotide 
position 1 136; (ii) the sense sequence of SEQ ID NO:! from about nucleotide position 1 147 

5 to about nucleotide position 2412; (iii) sense sequence of SEQ ID NO:l from about 

nucleotide position 2409 to about nucleotide position 3410; (iv) the sense sequence of SEQ 
ID NO:2 from about nucleotide position 80 to about nucleotide position 1048; (v) the sense 
sequence of SEQ ID NO:2 from about nucleotide position 1048 to about nucleotide position 
2295; (vi) the sense sequence of SEQ ID NO:2 from about nucleotide position 2348 to about 

10 nucleotide position 3061; (vii) the sense sequence of SEQ ID NO:2 from about nucleotide 
position 3214 to about nucleotide position 4677; (viii) the sense sequence of SEQ ID NO:2 
from about nucleotide position 4674 to about nucleotide position 5879; (iv) the sense 
sequence of SEQ ID NO:2 from about nucleotide position 5917 to about nucleotide position 
7386; and (x) the sense sequence of SEQ ID NO:2 from about nucleotide position 7415 to 

is about nucleotide position 7996; 

(b) sequences complementary to the sequences of (a); 

(c) sequences that, on expression, encode a polypeptide encoded by the 
sequences of (a); and 

(d) analogous sequences that hybridize under stringent conditions to the 
20 sequences of (a). 

2. The polynucleotide of claim 1 that is a DNA molecule or RNA molecule. 

3. The polynucleotide of claim 2 wherein the nucleotide sequence is the nucleotide 
sequence of (a) selected from the group consisting of (i) the sense sequence of SEQ ID NO: 1 
from about nucleotide position 54 to about nucleotide position 1 136; (ii) the sense sequence 
of SEQ ID NO:l from about nucleotide position 1 147 to about nucleotide position 2412; (iii) 

5 the sense sequence of SEQ ED NO:2 from about nucleotide position 2348 to about nucleotide 
position 3061 ; (iv) the sense sequence of SEQ ID NO:2 from about nucleotide position 4674 
to about nucleotide position 5879; and (v) the sense sequence of SEQ ID NO:2 from about 
nucleotide position 5917 to about nucleotide position 7386, 

4. The polynucleotide of claim 2 wherein the nucleotide sequence is the nucleotide 
sequence of (a) selected from the group consisting of (i) sense sequence of SEQ ID NO: 1 
from about nucleotide position 2409 to about nucleotide position 3410; (ii) the sense 
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sequence of SEQ ID NO:2 from about nucleotide position 80 to about nucleotide position 
1048; (iii) the sense sequence of SEQ ID NO:2 from about nucleotide position 1048 to about 
nucleotide position 2295; (iv) the sense sequence of SEQ ID NO:2 from about nucleotide 
position 3214 to about nucleotide position 4677; and (v) the sense sequence of SEQ ID NO:2 
from about nucleotide position 7415 to about nucleotide position 7996. 

5. The polynucleotide of claim 2 wherein the nucleotide sequence is the nucleotide 
sequence of (a) having the sense sequence of SEQ ID NO:2 from about nucleotide position 
80 to about nucleotide position 1048. 

6. A vector comprising the DNA molecule of claim 2. 

7. The vector of claim 6 further comprising an enhancer-promoter operatively linked to 
the polynucleotide. 

8. The vector of claim 6 wherein the polynucleotide has the nucleotide sequence of 
claim 5. 

9. A host cell transformed with the vector of claim 6 or claim 7 or claim 8. 

10. The transformed host cell of claim 9 that is a bacterial cell. 

1 1 . The transformed host cell of claim 10 wherein the bacterial cell is selected from the 
group consisting of Streptomyces and £. colL 

12. A method for directing the biosynthesis of specific glycosylation-modified 
polyketides by genetic manipulation of a polyketide-producing microorganism, said method 
comprising the steps of: 

(1) isolating a sugar biosynthesis gene-containing DNA sequence according to claim 

l; 

(2) identifying within said gene-containing DNA sequence one or more DNA 
fragments responsible for the biosynthesis of a polyketide-associated sugar or its attachment 
to a polyketide; 

(3) creating one or more specified changes into said DNA fragment or fragments, 
thereby resulting in an altered DNA sequence; 

(4) introducing said altered DNA sequence into a polyketide-producing 
microorganism to replace the original sequence, said altered DNA sequence, when translated, 
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resulting in altered enzymatic activity capable of effecting the production of said specific 
glycosylation-modified polyketide; 

(5) growing a culture of said altered polyketide-producing microorganism under 
conditions suitable for the formation of said specific glycosylation-modified polyketide; and 

(6) isolating said specific glycosylation-modified polyketide from said culture. 

1 3. The method of claim 12 wherein said specified change in said DNA fragment or 
fragments results in the inactivation of at least one enzymatic activity involved in the 
biosynthesis of a polyketide-associated sugar or in its attachment to a polyketide. 

14. The method of claim 13 wherein said polyketide-associated sugar is L-mycarose. 

15. The method of claim 1 3 wherein said polyketide-associated sugar is D-desosamine. 

16. A method for directing the biosynthesis of specific glycosylation-modified 
polyketides by genetic manipulation of a polyketide-producing microorganism, said method 
comprising the steps of: 

(1) isolating a sugar biosynthesis gene-containing DNA sequence according to claim 

1; 

(2) identifying within said gene-containing DNA sequence one or more DNA 
fragments responsible for the biosynthesis of a polyketide-associated sugar or its attachment 
to a polyketide; 

(3) reversing the strand orientation of said DNA fragment or fragments, thereby 
resulting in an altered DNA sequence which, when transcribed, results in production of an 
antisense mRNA; 

(4) introducing said altered DNA sequence into a polyketide-producing 
microorganism having an mRNA capable of binding to said antisense mRNA to produce an 
altered polyketide-producing microorganism capable of producing said specific 
glycosylation-modified polyketide; 

(5) growing a culture of said altered polyketide-producing microorganism under 
conditions suitable for the formation of said specific glycosylation-modified polyketide; and 

(6) isolating said specific glycosylation-modified polyketide from said culture. 

17. A method for directing the biosynthesis of specific glycosylation-modified 
polyketides by genetic manipulation of a polyketide-producing microorganism, said method 
comprising the steps of: 

(1) isolating a sugar biosynthesis gene-containing DNA sequence according to claim 
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l; 

(2) identifying within said gene-containing DNA sequence one or more DNA 
fragments responsible for the biosynthesis of a polyketide-associated sugar or its attachment 
to a polyketide; 

(3) introducing said DNA fragment or fragments into a distinct polyketide-producing 
microorganism to produce an altered polyketide-producing microorganism capable of 
producing said specific glycosylation-modified polyketide; 

(4) growing a culture of said polyketide-producing microorganism containing said 
DNA fragment or fragments under conditions suitable for the formation of said specific 
glycosylation-modified polyketide; and 

(6) isolating said specific glycosylation-modified polyketide from said culture. 

1 8. The method of claim 1 3 or claim 1 6 or claim 1 7 wherein said DNA fragment 
comprises one or more genes which encode an enzymatic activity involved in the 
biosynthesis of L-mycarose or in its attachment to a polyketide. 

19 The method of claim 1 3 or claim 1 6 or claim 17 wherein said DNA fragment 
comprises one or more genes which encode an enzymatic activity involved in the 
biosynthesis of D-desosamine or in its attachment to a polyketide. 

20. The method of claim 13 or claim 16 or claim 17 wherein said DNA fragment is the 
sequence of claim 8. 

21. An isolated polypeptide having an amino acid sequence encoded by a nucleotide 
sequence selected from the group consisting of the sense sequence of SEQ ID NO: 1 from 
about nucleotide position 54 to about nucleotide position 1 136; the sense sequence of SEQ ID 
NO: 1 from about nucleotide position 1 147 to about nucleotide position 2412; sense sequence 
of SEQ ID NO: 1 from about nucleotide position 2409 to about nucleotide position 3410; the 
sense sequence of SEQ ID NO:2 from about nucleotide position 80 to about nucleotide 
position 1048; the sense sequence of SEQ ID NO:2 from about nucleotide position 1048 to 
about nucleotide position 2295; the sense sequence of SEQ ID NO:2 from about nucleotide 
position 2348 to about nucleotide position 3061 ; the sense sequence of SEQ ID NO:2 from 
about nucleotide position 3214 to about nucleotide position 4677 ; the sense sequence of SEQ 
ID NO:2 from about nucleotide position 4674 to about nucleotide position 5879; the sense 
sequence of SEQ ID NO:2 from about nucleotide position 5917 to about nucleotide position 
7386; and the sense sequence of SEQ ID NO:2 from about nucleotide position 7415 to about 
nucleotide position 7996. 
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22. An isolated polypeptide of claim 3 1 encoded by the sequence of SEQ ID NO:2 from 
about nucleotide position 80 to about nucleotide position 1048. 
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1 CACGCCGACGCGATCGCGCGGCACATCGACGCCTGGCTGGGCGGAGGGAATTCATGACCA 60 

M T T 

6 1 CGACCGATCGCGCCGGGCTGGGCAGGCAGCTCCAGATGATCCGCGGCCTGCACTGGGGTT 120 
TDRAGLGRQLQMIRGLHWGY 

121 ACGGCAGCAACGGCGACCCTTACCCGATGCTGCTGTGCGGACACGACGACGACCCGCAGC 180 
GSNGDPYPMLLCGHDDDPQR 

t , ■ • 

1 8 1 GCCGGTACCGCTCGATGCGCGAGTCCGGTGTGCGGCGCAGGACCGAGACGTGGGTGGTGG 240 
RYRSMRESGVRRRTETWVVA 

241 CCGACCACGCCACCGCCCGGCAGGTGCTCGACGACCCCGCGTTCACCCGCGCCACCGGAC 300 
DHATARQVLDDPAFTRATG R 

301 GCACACCGGAATGGATGCGGGCCGCGGGCGCGCCACCCGCCGAGTGGGCCCAGCCGTTCC 360 
TPEWMRAAGAPPAEWAOPFR 

361 GGGACGTGCACGCCGCGTCCTGGGAAGGCGAGGTCCCCGACGTCGGGGAACTGGCGGAGA 420 
DVHAASWEGEVPDVG ELAES 

421 GCTTCGCCGGTCTGCTCCCCGGCGCGGGCGCGCGGCTGGACCTGGTCGGCGACTTCGCCT 480 
FAGI*LPGAGARLD I*VGDFAW 

481 GGCAGGTACCGGTGCAGGG^TGACCGCCGTGCTCGGCGCAGCCGGAGTGCTGCGCGGCG 540 
QVPVQGMTAVLGAAGVLRGA 

# , • • 

541 CCGCGTGGGACGCCCGCGTCAGCCTGGACGCCCAGCTCAGCCCGCAGCAGCTCGCGGTGA 600 
AWDARVSLDAQLSPQQ LAVT 

601 CCGAAGCAGCGGTCGCGGCACTGCCCGCCGACCCCGCACTGCGCGCCCTGTTCGCCGGGG 660 
EAAVAALPADPALRALFAGA 

6 61 CCGAGATGACCGCGAACACCGTGGTCGACGCGGTCCTGGCCGTCTCGGCCGAACCGGGGC 720 

EMTANTVVDAVLAVSAEPGL 

721 TGGCCGAACGGATCGCCGACGACCCCGCCGCCGCGCAGCGAACCGTCGCCGAGGTGCTGC 180 
AERIADDPAAAQRTVAEVLR 

7 8 1 GCCTGCACCCGGCATTGCACCTGGAGCGGCGCACGGCCACCGCAGAGGTGCGGCTCGGCG 840 

LHPALHLERRTATAEVRLGE 

841 AGCACGTGATCGGCGAAGGCGAGGAGGTCGTGGTCGTCGTCGCGGCGGCCAACCGCGACC 900 
HVI GEGEEVVVVVAAANRDP 

901 CGGAGGTCTTCGCCGAGCCCGACCGCCTCGACGTGGACCGCCCCGACGCCGACCGCGCGC 960 
EVFAEPDRLDVDRPDADRAi 



FIG. 4A-1 
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961 TGTCGGCACATCGCGGCCACCCCGGCAGGCTGGAGGAGCTGGTCACCGCGCTCGCCACCG 1020 
SAHRGHPGRLEELVTALATA 

• * • * 

1021 CCGC ACTGCGGGCCGCGGCC AAGGCGCTGCCCGG ACTC ACGCCCAGCGGCCCGGTCGTCC 1080 
ALRAAAKALPGLTP SGPVVR 

1081 GGCGCCGCCGATC ACCCGTCCTGCGGGGAACC AACCGCTGCCCCGTCGAGCTCTGAGGAT 1140 
RRRSPVLRGTNRCPVEL* 

♦ • • • • 

1141 TCCGCGATGCGCGTCGTCTTCTCCTCCATGGCCAGCAAGAGCCACCTCTTCGGCCTCGTC 1200 
MRVVFSSMAS KS HLFGLV 

1201 CCCCTCGCATGGGCGTTCCGCGCGGCGGGGCACGAGGTCCGCGTGGTCGCGTCCCCGGCG 12 60 
PLAWAFRAAGHEVRVVASPA 

1261 CTCACCGAGGACATCACCGCGGCCGGGCTGACCGCCGTCCCGGTCGGCACCGACGTCGAC 1320 
LTED ITAAGLTAVP VGTDVD 

• • • 

1321 CTCGTGGACTTC ATGACCCACGCGGGCCACGACATCATCGACTACGTCCGGAGCCTGGAC 1380 
LVDFMTHAGHDIIDYVRSLD 

♦ • • • * • 

1381 TTC AGCGAGCGGGACC CCGCCACCTTGACCTGGGAGC ACCTGCGGGGCATGCAG ACCGTG 1440 
FSERDPATLTWE HLRGMQTV 

1441 CTCACCCCGACCTTCT ACGCCCTG ATGAGCCCGG AC ACGCTC ATCGAAGGCATGGTCTCG 1500 
LTPTFYALMSPDTL IEGMVS* 

. • • • 

1501 TTCTGCCGGAAGTGGCGGCCCGACCTGGTCATCTGGGAGCCGCTCACCTTCGCCGCGCCC 1560 
FCRKWRPDLVIWEP LTFAAP 

• • • • • 

1561 ATCGCGGGCGCGGTGACCGGAACGCCGCACGCGCGGCTGCTGTGGGGACCCGACATCACC 1620 
IAGAVTGTPHARLLWGPDIT 

1621 ACCCGGGCGCGGCAGAACTTCCTCGGCCTGCTGCCCGACCAGCCGGAGGAGCACCGGGAG 1680 
TRARQNFLGLLPDQPEEHRE 

. • • * * 

1681 GGCCCGCTCGCCGAGTGGCTCACCTGGACGCTGG AGAAGT ACGGCGGCCCGGCCTT CGAC 1740 
GPLAEWLTWTLEKYGGPAFD 

. • • • • • 

17 41 GAGGAGGTGGTCGTCGGGCAGTGGACGATCGACCCCGCCCCGGCCGCGATCAGGCTCGAC 1800 
EEVVVGQWTIDPAPAAIRLD 

1801 ACCGGCCTGAAGACCGTCGGGATGCGCTACGTCGACTACAACGGGCCGTCCGTGGTGCCG 1860 
TGLKTVGMRYVDYNGPSVVP 

1861 GAATGGCTGC ACGACG AGCCCG AGCGCCGCCGCGTGTGCCTC ACGCTCGGGATCTCCAGC 1920 
EWLHDEPERRRVCLTLGI SS 
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1921 CGCGAGAACAGC ATCGGGCAGGTCTCCATCGAGGAGCTGCTGGGTGCCGTCGGCGACGTC 1980 
RENSIGQVSIEELLGAVGDV 

• • ' * 

1981 GACGCCGAGATC ATCGCGACCTTCGACGCGC AGCAGCTAGAAGGCGTCGCGAACATCCCG 2040 
DAEI IATFDAQQLEGVANIP 

2041 CACAACGTCCGC ACGGTCGGCTTCGTCCCGATGCACGCGCTGCTGCCG ACCTGCGCGGCG 2100 
HNVRTVGFVPMHALLPTCAA 

• • • • 

2101 ACGGTGCACCACGGCGGACCCGGGAGCTGGCACACCGCGGCGATCCACGGCGTGCCGCAG 2160 
TVHHGGPGSWHTAAIHGVPQ 

i ♦ • * • " 

2161 GTGATCCTGCCCGACGGCTGGGACACCGGCGTGCGCGCGCAGCGCACGCAGGAATTCGGG 2220 
VILPDGWDTGVRAQRTQEFG 

• ♦ * • 

2221 GCGGGGATCGCGCTGCCCGTGCCCGAGCTGACCCCCGACCAGCTCCGGGAGTCGGTGAAG 2280 
AGIALPVPELTPDQLRESVK 

2281 CGGGTCCTCGACGACCCGGCCCACCGCGCCGGCGCGGCGCGGATGCGCGACGACATGCTC 2340 
RVLD DP AHRAGAARMRDDML 

• * * • 

2341 GCGGAGCCGTCACCGGCCGAGGTCGTCGGCATCTGCGAGGAACTGGCCGCAGGAAGGAGA 2400 
AEP S PAEVVG I C . E E LAAGRR 

2401 GAACCACGATGACC ACCGACGQCGCGACGCACGTGCGGCTCGGGCGTTCCGCGCTGCTCA 2460 
E P R * 

MTTDAATHVRLGRSALLT 

• • • • 

2461 CCAGCAGGCTCTGGCTCGGCACGGTGAACTTCAGCGGACGCGTCGAGGACGACGACGCGC 2520 
SRLWLGTVNFSGRVEDDDAL 

• • • • * 

2521 TGCGCCTGATGGACCACGCCCGGGACCGCGGCATCAACTGCCTCGACACCGCCGACATGT 2580 
RLMDHARDRGINCLDTADMY 

2581 ACGGCTGGCGGCTCTACAAGGGCCACACCGAGGAGCTGGTGGGCAGGTGGCTGGCCCAGG 2640 
GWRLYKGHTEELVGRWLAQG 

2641 GCGGCGGACGGCGCGAGG AC ACCGTGCTGGCGACCAAGGTCGGCGGCGAGATGAGCG AGO 2700 
GGRREDTVLATKVGGEMSER 

2701 GCGTCAACGACAGCGGGCTGTCGGCGCGGCACATCATCGCCTCCTGCGAGGGATCGCTGC 2760 
VNDSGLSARHIIASCEGSLR 

2761 GCAGGCTGGGCGTCGACCACATCGACGTCTACCAGATGCACCACATCGACCGGTCCGCGC 2820 
RLGVDHIDVYQMHHIDRSAP 

2821 CGTGGGACGAGGTGTGGCAGGCCATGGACAGCCTCGTCGCCAGCGGCAAGGTCTCCTACG 2880 
WDEVWQ AMDS LVA SGKVS Y V 
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2861 TCGGCTCGTCGAACTTCGCGGGCTGGCACATCGCCGCCGCGC AGGAGAACGCCGCCCGCC 2940 
GSSNFAGWHIAAAQENAARR 

2941 GCCACTCCCTGGGCATGGTCTCCCACCAGTGCCTGTACAACCTGGCGGTCCGGCACGCCG 3000 
HSLGMVS.HQCLYNLAVRHAE 

3001 AGCTGGAGGTGCTGCCCGCCGCGC AGGCCT ACGGGCTCGGCGTCTTCGCCTGGTCGCCGC 3060 
LEVLPAAQAYGLGVFAWSPL. 

3061 TGCACGGCGGCCTGCTCAGCGGAGCGCTGGAGAAGCTGGCCGCGGGCACCGCGGTGAAGT 3 120 
HGGLLSGALEKLAAGTAVKS 

3121 CGGCGCAGGGCCGTGCGCAGGTGCTGTTGCCGTCCCTGCGCCCGGCGATCGAGGCCTACG 3180 
AQGRAQVLLP S I» R P A I E A Y E 

3181 AGAAGTTCTGCCGCAACCTCGGCGAAGACCCGGCCGAGGTGGGGCTCGCATGGGTGCTGT 3240 
KFCRNLGEDPAEVGLAWVLS 

3241 CCCGGCCCGGCATCGCCGGCGCCGTCATCGGCCCGCGAACCCCCGAGCAGCTCGACTCCG 3300 
RPGIAGAVIGPRTP EQLDSA 

3301 CGCTGAAGGCGTCCGCGATGACCCTGGAC^GCAGGCGCTGTCCGAACTGGACGAGATCT 3360 
LKASAMTLDEQALSELDEIP 

* • • • • * 

3361 TCCCCGCGGTGGCCTCCGGCGGCGCGGCGCCGGAAGCCTGGTTGCAGTGAGCACAAGAGG 3420 

PAVASGGAAPEAWLQ* 

3421 AACCGAGAAAGGATACGGCTGGTGAGCGTGAAGCAGAAGTCAGCGTTGCAGGACCTGGTC 3480 

3481 GACTTCGCCAAGTGGC ACGTGTGGACCAGGGTGCGGCCGTCC AGCCGTGCGCGCCTGGCC 3540 

3541 TACGAGCTGTTCGCCGACGACCACGAGGCCACGACCGAGGGCGCCTACATCAACCTCGGC 3600 



3601 TACTGGAAGCCCGGGTGCGCCGGCCTGGAGGAGGCCAACCAGGAGCTGGCGAACCAGCTC 3660 
3661 GCCGAGGCCGCGGGGATCAGCGAGGGCGACGAGGTGCTCGACGTCGGGTTCGGGCTCGGC 3720 



3721 GCGCAGGACTTCTTCTGGCTCGACCTGCAGCCAGCT 3756 
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1 CGGGTTGCCGCACATCGCGCTGGGGAGATTCTTTGAATTTCGCCCGTAGCACCGACCTGG 60 



6 1 AAAGCGAGCAAATGCTCCGGTGAATGGGATCAGTGATTCCCCGCGTCAATTGATCACCCT 120 

VKGISDSPRQLITL 



121 TCTGGGCGCTTCCGGCTTCGTCGGGAGCGCGGTTCTGCGCGAGCTGCGCGACCACCCGGT 180 
LGASGFVGSAVLRELRDHPV 



181 CCGGCTGCGCGCGGTGTCCCGCGGCGGAGCGCCCGCGGTTCCGCCCGGCGCCGCGGAGGT 240 
RLRAVSRGGAPAVPPGAAEV 



241 CGAGGACCTGCGCGCCGACCTGCTGGAACCGGGCCGGGCCGCCGCCGCGATCGAGG ACGC 300 
EDLRADLLEPGRAAAAIEDA 



301 CGACGTGATCGTGCACCTGGTGGCGCACGCAGCGGGCGGTTCCACCTGGCGC AGCGCCAC 360 
DVIVHLVAHAAGGSTWRSAT 

361 CTCCGACCCGGAAGCCGAGCGGGTCAACGTCGGCCTGATGCACG ACCTCGTCGGCGCGCT 420 
S DP EAERVNVG LMHD LVGAL 



421 GC ACGATCGCCGCAGGTCG ACGCCGCCCGTGTTGCTCTACGCGAGCACCGCACAGGCCGC 480 
HDRRRSTPPVLLYASTAQAA 

481 GAACCCGTCGGCGGCCAGCAGGTACGCGCAGCAGAAGACCGAGGCCGAGCGCATCCTGCG 540 
NPSAASRYAQQKTEAERILR 

541 CAAAGCCACCGACGAGGGCCGGGTGCGCGGCGTGATCCTGCGGCTGCCCGCGGTCTACGG 600 
KATDE GRVRGVILRLPAVYG 



601 CCAGAGCGGCCCGTCCGGCCCCATGGGGCGGGGCGTGGTCGCAGCGATGATCCGGCGTGC 6 60 
QSGPSGPMGRGVVAAMIRRA 

661 CCTCGCCGGCGAGCCGCTCACCATGTGGCACGACGGCGGCGTGCGCCGCGACCTGCTGCA 720 
L A G E P LTMWHDGGVRRDLLH 



721 CGTCGAGGACGTGGCCACCGCGTTCGCCGCCGCGCTGGAGCACCACGACGCGCTGGCCGG 780 
VEDVATAFAAAIiEHH OALAG 



781 CGGCACGTGGGCGCTGGGCGCCGACCGATCCGAGCCGCTCGGCGACATCTTCCGGGCCGT 840 
GTWALGADRSEPLGDIFRAV 



841 CTCCGGCAGCGTCGCCCGGCAGACCGGC AGCCCCGCCGTCGACGTGGTCACCGTGCCCGC 900 
SGSVARQTGSPAVDVVTVPA 



901 GCCCGAGCACGCCG AGGCCAACGACTTCCGCAGCGACGACATCGACTCCACCGAGTTCCG 960 
P EHAEANDFRSDDIDSTEFR 
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961 CAGCCGGACCGGCTGGCGCCCCCGGGTTTCCCTCACCGACGGCATCGACCGGACGGTGGC 1020 
S RTGWRP RVS LTDG I DRTVA 

• - • • • * 

1021 CGCCCTGACCCCCACCGAGGAGCACTAGTGCGGGTACTGCTGACGTCCTTCGCGCACCGC 1080 

A I' t" r T E, E H * 

VRVLLTSFAHR 

1081 ACGCACTTCCAGGGACTGGTCCCGCTGGCGTGGGCGCTGCGCACCGCGGGTCACGACGTG 1140 
THFQGLVP LAWALRTAGHDV 

1141 CGCGTGGCCGCCCAGCCCGCGCTCACCGACGCGGTCATCGGCGCCGGTCTCACCGCGGTA 1200 
RVAAQPALTDAVIGAGLTAV 

• • • * * 

1201 CCCGTCGGCTCCGACCACCGGCTGTTCGACATCGTCCCGGAAGTCGCCGCTCAGGTGCAC 1260 
PVGSDHRLFDIVPEVAAQVH 

• • * 

1261 CGCTACTCCTTCTACCTGGACTTCTACCACCGCGAGCAGGAGCTGCACTCGTGGGAGTTC 1320 
RYSFYLDFYHREQELHSWEF 

■ * • * 

1321 CTGCTCGGCATGCAGG AGGCCACCTCGCGGTGGGTAT ACCCGGTGGTCAAC AACGACTCC 1380 
LLGMQEATSRWVYP VVNNDS 

• • • * 

1381 TTCGTCGCCG AGCTGGTCGACTTCGCCCGGGACTGGCGTCCTG ACCTGGTGCTCTGGGAG 1440 
FVAELVDFARDW.RPDLVLWE 

1441 CCGTTCACCTTCGCCGGCGCCGTCGCGGCCCGGGCCTGCCGAGCCGCGCACGCCCGGCTG 1500 
PFTFAGAVAARACGAAHARL 

1501 CTGTGGGGCAGCGACCTC ACCGGCTACTTCCGCGGCCGGTTCCAGGCGCAACGCCTGCGA 1560 
LWGSDLTGYFRGRFQAQRLR 

1561 CGGCCGCCGGAGGACCGGCCGGACCCGCTGGGCACGTGGCTGACCGAGGTCGCGGGGCGC 1620 
RPPEDRPDPLGTWLTEVAGR 

1621 TTCGGCGTCGAATTCGGCGAGGACCTCGCGGTCGGGCAGTGGTCGGTCGACCAGTTGCCG 1680 
FGVEFGEDLAVGQWSVDQLP 

1681 CCGAGTTTCCGGCTGG AC ACCGGAATGG AAACCGTTGTCGCGCGGACCCTGCCCTACAAC 1740 
PSFRLDTGMETVVARTLPYN 

1741 GGCGCGTCGGTGGTTCCGGACTGGCTCAAG AAGGGC AGTGCG ACTCGACGCATCTGCATT 1800 
GASVVPDWLKKGSATRRICI 

1801 ACCGGAGGGTTCTCCGGACTCGGGCTCGCCGCCG ATGCCG ATCAGTTCGCGCGGACGCTC 1860 
TGGFSGLGLAADADQFART L 
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18 61 GCGCAGCTCGCGCGATTCGATGGCGAAATCGTGGTTACGGGTTCCGGTCCGGATACCTCC 1920 
AQLARFDGEIVVTGSGPDTS 

1921 GCGGTACCGGACAACATTCGTTTGGTGGATTTCGTTCCGATGGGCGTTCTGCTCCAGAAC 1980 

AVPDN irlvdfvpmgvllqn 



1981 TGCGCGGCG ATC ATCC ACCACGGCGGGGCCGGAACCTGGGCC ACGGCACTGCACCACGGA 2040 
CAAI IHHGGAGTWATALHHG 



2041 ATTCCGCAAATATCAGTTGCACATGAATGGGATTGCATGCTACGCGGCCAGCAGACCGCG 2100 
IPQI SVAHEWDCMLRGQQTA 



2101 GAACTGGGCG CGGGAATCTACCTCCGGCCGG ACGAGGTCG ATGCCG ACTC ATTGGCGAGC 2160 
ELGAGIYLRPDEVDADSLAS 



2161 GCCCTCACCCAGGTGGTCGAGGACCCCACCTACACCGAGAACGCGGTGAAGCTTCGCGAG 2220 
ALTQVVEDPT YTENAVKLRE 



♦ . ♦ • • 

2221 GAGGCGCTGTCCGACCCGACGCCGCAGGAGATCGTCCCGCGACTGGAGGAACTCACGCGC 2280 
EALS DPTPQEIVPRIiEELTR 

2281 CGCCACGCCGGCTAGCGGTTTCCGACCGACAAGTCCGTCCGACAGCACACCTCCGG AGGG 2340 
R H A G * 

2341 AGCAGGGATGTACGAGGGCGGGTTCGCCGAGCTTTACGACCGGTTCTACCGCGGCCGGGG 2400 
MYEGGFAELYDRFYRGRG 



2401 CAAGGACTACGCGGCCGAGGCCGCGCAGGTCGCGCGGCTGGTCAGAGACCGCCTGCCCTC 2460 
KDYAAEA AQVARLVRDRLPS 



2461 GGCTTCCTCGCTGCTCGACGTGGCCTGCGGGACCGGCACCCACCTGCGCCGGTTCGCCGA 2520 
ASSLLDVACGTGTHLRRFAD 



2521 CCTCTTCGACGACGTGACCGGGCTGGAGCTGTCGGCGGCGATGATCGAGGTCGCCCGGCC 2580 
LFDD VTGLELSAAMIEVARP 



2581 GCAGCTCGGCGGCATCCCGGTGCTGCAGGGCGACATGCGCGACTTCGCGCTGGATCGCGA 2640 
QLGG IPVLQGDMRDFALD.RE 



2641 GTTCGACGCCGTCACCTGCATGTTCAGCTCCATCGGGCACATGCGCGACGGCGCCGAGCT 2700 
FDAVTCMFSS I GHMRDGAEL 



2701 GGACCAGGCGCTGGCGTCCTTCGCCCGCCACCTCGCCCCCGGCGGCGTCGTGGTGGTCGA 2760 
DQALASFARHLAPGGVVVVE 



2761 ACCGTGGTGGTTCCCGGAGGACTTCCTCGACGGCTACGTGGCCGGTGACGTGGTGCGCGA 2820 
PWWFPEDFLDGYVAGDVVRD 
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2821 CGGCGACCTGACGATCTCGCGCGTCTCGCACTCCGTGCGCGCCGGCGGCGCG ACCCGGAT 2880 
GDLTISRVSHSVRAGGATRM 

2881 GGAGATCCACTGGGTCGTGGCCGACGCGGTGAACGGTCCGCGGCACCACGTGGAGCACTA 2940 
EIHWVVA DAVNGPRHHVEHY 

• • • * 

2941 CGAGATCACGCTCTTCGAGCGGCAGCAGTACGAGAAGGCCTTCACCGCGGCCGGTTGCGC 3000 
EITLFERQQYEKAFTAAGCA 

3001 TGTGCAGTACCTGGAGGGCGGACCCTCCGGACGCGGGTTGTTCGTCGGTGTGCGCGGATG 3060 
VQYLEGGPSGRGLFVGVRG* 

3061 ACCCGTGCGTCGCGTTTTCCGTTCCTGGCACAGGTGATCCGCTCCACGGGCCCTTTCCCC 3120 

3121 GCCGTGACCGGACCCTTACAGTGAGTGCGGGTCTTGATCGACAACGCCCGGCGGCAGCAA 3180 

3181 GCGGAGCCGTCGACGACACCGCAGGGAGAGTCGATGGGTGATCGGACCGGCGACCGGACG 3240 

MGDRTGDRT 

* • • * * 

3241 ATTCCGGAATCCTCGCAGACCGCAACGCGTTTCCTGCTCGGCGACGGCGGAATCCCCACC 3300 

IPES SQTATRF LLGDGGI P T 

3301 GCCACGGCGGAAACCCACGACTGGCTGACCCGCAACGGCGCCGAGCAGCGGCTCGAGGTG 3360 
ATAETHDWLTRNGAEQRLEV 

3361 GCGCGCGTGCCGTTCAGCGCCATGGACCGCTGGTCGTTCCAGCCCGAGGACGGCAGGCTC 3420 
ARVPFSAMDRWSFQPEDGRL 

3421 GCCC ACGAGTCCGGGCGCTTCTTCTCC ATCGAGGGCCTGC ACGTGCGGACGAACTTCGGC 3480 
AHESGRFFSIEGLHVRTNFG 

* * * 

3481 TGGCGGCGGGACTGGATCCAGCCCATCATCGTGCAGCCCGAGATCGGCTTCCTCGGCCTC 3540 
WRRDWIQPIIVQPEIGFLGL 

3541 ATCGTCAAGGAGTTCGACGGTGTGCTGCACGTGCTGGCGCAGGCCAAGGCCGAGCCGGGC 3 600 
IVKEFDGVLHVLAQAKAEPG 

3601 AACATCAACGCCGTCCAGCTCTCCCCGACCCTGCAGGCGACCCGCAGCAACTACACCGGC 3660 
NINAVQLSPTLQATRSNYTG 

• • • • • * 

3661 GTCCACCGCGGCTCGAAGGTCCGGTTCATCGAGTACTTCAACGGCACGCGCCCGAGCCGG 3720 

VHRGSKVRFIEYFNGTRP SR 

3721 ATCCTCGTCG ACGTGCT CCAGTCCGAGC AGGGCGCGTGGTTCCTGCGC AAGCGC AACCGG 3780 
ILVDVLQSEQGAWFLRKRNR 
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37 81 AACATGGTCGTCGAGGTGTTCGACGACCTGCCCGAGCACCCGAACTTCCGGTGGCTGACC 384 0 
NMVVEVFDDLPEHPNFRWLT 

3841 GTCGCGCAGCTGCGGGCGATGCTGCACCACGACAACGTGGTGAACATGGACCTGCGCACC 3900 
VAQLRAMLH HDNVVNMDLRT 

3901 GTGCTGGCCTGCGTCCCGACCGCCGTGGAGCGGGACCGGGCCGACGACGTGCTCGCGCGC 3960 
VLACVP TAVERDRADDVLAR 



3961 CTGCCCGAGGGCTCGTTCCAGGCCCGGCTGCTGCACTCGTTC ATCGGCGCGGGCACCCCG 4020 
LPEGSFQARLLHSFIGAGTP 

4021 GCCAACAACATGAACAGCCTGCTGAGCTGGATCTCCGACGTGCGCGCCAGGCGCGAGTTC 4080 
ANNMNSLLSWI SDVRARREF 



• • • ■ • * 

4081 GTGCAGCGCGGCCGCCCGCTGCCCGACATCGAGCGCAGCGGGTGGATCCGCCGCGACGAC 4140 
VQRGRPLPDIERSGWIRRDD 

4141 GGCATCGAGCACGAGGAG AAGAAGTACTTCGACGTCTTCGGCGTCACGGTGGCG ACCAGC 4200 
GIEHEEKKYFDVFGVTVATS 



4201 GACCGCGAGGTCAACTCGTGGATGCAGCCGCTGCTCTCGCCCGCCAACAACGGCCTGCTC 4260 
DREVNSWMQP LLS PANNGLL 

42 61 GCCCTGCTGGTCAAGGACATCGGCGGCACGTTGCACGCGCTCGTGCAGCTGCGCACCGAG 4320 
ALLVKD I GGTLHAI»VQLRTE 



4321 GCGGGCGGGATGGACGTCGCCGAGCTGGCGCCTACGGTGCACTGCCAGCCCGACAACTAC 4380 
AG GMDVAELAP TVHCQPDNY 



4381 GCCGACGCGCCCGAGGAGTTCCGACCGGCCTATGTGGACTACGTGTTGAACGTGCCGCGC 4440 
ADAPEEFRPAYVDYVLNVPR 



4441 TCGCAGGTCCGCTACGACGCATGGCACTCCGAGGAGGGCGGCCGGTTCTACCGCAACGAG 4500 
SQVRYDAWHSEEGGRF YRNE 

* . • • • 

4501 AACCGGTACATGCTGATCGAGGTGCCCGCCGACTTCGACGCCAGTGCCGCTCCCGACCAC 4560 
NRYMLX EVPADFDASAAPDH 



4561 CGGTGGATGACCTTCGACCAGATCACCTACCTGCTCGGGCACAGCCACTACGTCAACATC 4 620 
RWMTFDQITYLLGHSHYVNI 



4621 CACGTGCGCAGCATCATCGCGTGCGCCTCGGCCGTCTACACCAGGACCGCCGGATGAAAC 4 680 
HVRSIIACASAVYTRTAG* 

M K R 



4681 GCGCGCTG ACCG ACCTGGCG ATCTTCGGCGGCCCCG AGGCATTCCTGCAC ACCCTCT ACG 4140 
ALTDLAIFGGPEAFLHTLYV 
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4741 TGGGCAGGCCGACCGTCGGGGACCGGGAGCGGTTCTTCGCCCGCCTGGAGTGGGCGCTGA 4800 
GRPTVGDRERFFARLEWALW 

4801 AC AACAACTGGCTG ACC AACGGCGG ACCACTGGTGCGCGAGTTCG AGGGCCGGGTCGCCG 4860 
NNWLTNGGPLVREFEGRVAD 

4861 ACCTGGCGGGTGTCCGCCACTGCGTGGCCACCTGCAACGCGACGGTCGCGCTGCAACTGG 4920 
LAGVRHCVATCNATVALQLV 

4921 TGCTGCGCGCGAGCGACGTGTCCGGCGAGGTCGTCATGCCTTCGATGACGTTCGCGGCCA 4980 
L R A S D V S GEVVMPSMTFAAT 

4981 CCGCGCACGCGGCGAGCTGGCTGGGGCTGGAACCGGTGTTCTGCGACGTGGACCCCGAGA 5040 
AHAASWLGLEPVFCDVDPET 

5041 CCGGCCTGCTCGACCCCGAGCACGTCGCGTCGCTGGTCACACCGCGGACGGGCGCGATCA 5100 
GLLDPEHVASLVTPRTGAII 

• • • • . • * 

5101 TCGGCGTGCACCTCTGGGGC AGGCCCGCTCCGGTCGAGGCGCTGGAGAAGATCGCCGCCG 5160 
GVHLWGRPAPVEALE KIAAE 

• • • • ■ ■• 

51 61 AGCACCAGGTCAAACTCTTCTTCGACGCCGCGCACGCGCTGGGCTGCACCGCCGGCGGGC 5220 
HQVKLFFDAAHALGCTAGGR 

5221 GGCCGGTC GGCGCCTT CGGC AACGCCGAGGTGTT CAGCTTCCACGCCACGAAGGCGGTCA 5280 
PVGAFGNAEVFSFHATKAVT 

5281 CCTCGTTCGAGGGCGGCGCC ATCGTCACCGACGACGGGCTGCTGGCCG ACCGCATCCGCG 5340 
SFEGGAIVTDDGLLADRIRA 

5341 CC ATGCAC AACTTCGGGATCGC ACCGGACAAGCTGGTGACCGATGTCGGC ACCAACGGCA 5400 
MHNFGIAPDKLVTDVGTNGK 

5401 AGATGAGCGAGTGCGCCGCGGCGATGGGCCTCACCTCGCTCGACGCCTTCGCCGAGACCA 5460 
MS ECAAAMGLTSLD AFAETR 

54 61 GGGTGCACAACCGCCTCAACCACGCGCTCTACTCCGACGAGCTCCGCGACGTGCGCGGCA 5520 
VHNRLNHALYSDELRDVRGI 

5521 TATCCGTGCACGCGTTCGATCCTGGCGAGCAGAACAACTACCAGTACGTGATCATCTCGG 5580 
SVHAFDPGEQNNYQ YVII SV 

5581 TGGACTCCGCGGCCACCGGCATCGACCGCGACCAGTTGCAGGCG ATCCTGCGAGCGGAGA 5640 
D SAATGIDRDQLQA ILRAEK 

5641 AGGTTGTGGC AC AACCCT ACTTCTCCCCCGGGTGCC ACCAG ATGC AGCCGT ACCGG ACCG 5700 
VVAQPYFSPGCHQMQPYRTE 
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5701 AGCCGCCG CTGCGGCTGG AG AACACCG A AC AGCTCTCCGACCGGGTGCTCGCGCTGCCC A 5760 
PPLRLENTEQLSDRVLALPT 



57 61 CCGGCCCCGCGGTGTCCAGCGAGGACATCCGGCGGGTGTGCGACATCATCCGGCTCGCCG 5820 
GPAVSSEDIRRVCDI I R L A A 



5821 CCACCAGCGGCGAGCTGATCAACGCGCAATGGGACCAGAGGACGCGCAACGGTTCGTGAC 5880 
TSGELINAQWDQRTRNGS * 



• »••** 
5881 GACCTGCGCCACAAGTGCCAGGAGGTTCGCTCCCCGATGAACACAACTCGTACGGCAACC 5940 

MNTTRTAT 



5941 GCCCAGGAAGCGGGGGTCGCCGACGCGG CGCGCCCGGACGTCGACC GGCGGGCGGTCGTG " 6000 
AQEAGVADAARPDVDRRAVV 



6001 CGGGCGCTGAGCTCGGAGGTCTCCCGCGTCACCGGCGCCGGTGACGGTGACGCCCACGTG 6060 
RALSSEVSRVTGAGDGDAHV 



6061 CAGGCCGCCCGGCT CGCCGACCTCGCCGCGCACT ACGGGGCGCACCCGTTCACGCCGCTG 6120 
QAARLADLAAHYGAHPFTPL 



6121 GAGCAGACGCGTGCGCGGCTCGGCCTGGACCGCGCGGAGTTCGCCCACCTGCTCGACCTG 6180 
EQTRARLGLDRAEFAHLLDL 



6181 TTCGGCCGCATCCCGGACCTGGGCACCGCGGTGGAGCACGGTCCGGCGGGCAAGTACTGG 6240 
FGRIPDLGTAVEHGPAGKYW 



6241 TCCAACACGATCAAGCCGCTGGACGCCGCAGGCGCACTGGACGCGGCGGTCTACCGCAAG 6300 
SNTIKPLDAAGALDAAVYRK 



6301 CCTGCCTTCCCCTACAGCGTCGGCCTGTACCCCGGGCCGACGTGCATGTTCCGCTGCCAC 6360 
PAFPYSVGLYPGPTCMFRCH 



6361 TTCTGCGTGCGGGTGACCGGTGCCCGCTACGAGGCCGCATCGGTCCCGGCGGGCAACGAG 6420 
FCVRVTGARYEAASVPAGNE 



6421 ACGCTGGCCGCGATCATCGACGAGGTGCCCACGGACAACCCGAAGGCGATGTAC ATGTCG 6480 
T L A A I IDEVPTDNPKAMYMS 



6461 GGCGGGCTCGAGCCGCTGACGAACCCCGGTCTCGGCGAGCTGGTGTCGCACGCCGCCGGG 6540 
GGLEPLTNPGLGELVS HAAG 



6541 CGCGGTTTCGACCTCACCGTCTACACCAACGCCTTCGCCCTCACCGAGCAGACGCTGAAC 6600 
RGFDLTVYTNAFALTEQTLN 



6601 CGCCAGCCCGGCCTGTGGGAGCTGGGCGCGATCCGCACGTCCCTCT ACGGGCTGAACAAC 6660 
RQPGLWELGAIRTSLYGLNN 
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6661 GACGAGTACGAG ACG ACC ACCGGCAAGCGCGGCGCTTTCGAACGCGTCAAGAAG AACCTG 6720 
DEYETTTGKRGAFERVKKNL 

♦ 

6721 CAGGGCTTCCTGCGGATGCGCGCCGAGCGGGACGCGCCGATCCGGCTCGGCTTCAACCAC 6780 
QGFLRMRAERDAP IRLGFNH 

6781 ATCATCCTGCCGGGACGGGCCGACCGGCTC ACCGACCTCGTCGACTTCATCGCCGAGCTC 6840 
IILPGRADRLTDLVDFIAEL 

6841 AACGAGTCCAGCCCGCAACGGCCGCTGGACTTCGTGACGGTGCGCG AGGACTACAGCGGC 6900 
NESSPQRPLDFVTVREDYSG 

• • • • 

6901 CGCG ACGACGGCCGGCTGTCGGACTCCGAGCGCAACGAGCTGCGCG AGGGCCTGGTGCGG 6960 
RDDGRLSDSERNELREGLVR 

6961 TTCGTCGACTACGCCGCCGAGCGGACCCCGGGCATGCACATCGACCTGGGCTACGCCCTG 7020 
FVDYAAERTPGMH I D I* G Y A L 

7021 GAGAGCCTGCGGCGGGGTGTGGACGCCG AGCTGCTGCGCATCCGGCCGGAGACG ATGCGT 7080 
ESLRRGVDAELLRIRPETMR 

■ • • • * 

7081 CCCACCGCGCACCCCCAGGTCGCGGTGCAGATCGACCTGCTCGGCGACGTCTACCTCTAC 7140 
PTAHPQVAVQIDLLGDVYLY 

• •••** 
7141 CGCGAGGCGGGCTTCCCGGAGCTGGAGGGCGCCACCCGCTACATCGCGGGCCGGGTCACC 7200 

REAGFPELEGATRY IAGRVT 

7201 CCGT CGACCAGCCT GCGCGAGGTGGTGGAGAACTTCGTGCTGGAGAACGAGGGCGTGCAG 7260 
PSTSLREVVENFVLENEGVQ 

7261 CCCCGCCCCGGCGACGAGTACTTCCTCGACGGCTTCGACCAGTCGGTGACCGCACGGCTC 7320 
PRPGDEYFLDGFDQSVTARL 

7321 AACCAGCTCGAACGAGACATCGCCGACGGGTGGGAGGACCACCGCGGCTTCCTGCGCGGA 7380 
NQLERDIADGWEDHRGFLRG 

7381 AGGTGAACCGGAGTTGCGAGTACGTG AGCTGGCGGTGGCGGGCGGTTTCGAGTTCACCCC 7440 
r* VAGGFEFTP 

7441 CGACCCGAAGCAGGACCGGCGGGGCCTGTTCGTGTCTCCGCTGCAGGACGAGGCGTTCGT 7500 
DPKQDRRGLFVSPLQDEAFV 

7501 GGGCGCGGTGGGCC ATCGGTTCCCCGTCGCCCAGATGAACCACATCGTCTCCGCCCGGGG 7560 
GAVGHRFPVAQMNH IVSARG 

7561 CGTGCTGCGCGGGCTGCACTTCACCACCACCCCGCCGGGGCAGTGCAAGTACCTCTACTG 7620 
VLRGLHFTTTPPGQCKYVYC 
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7 621 C GCGCGCGGCCGGGCGCTCG ACGTCATCGTCG AC ATCCGGGTCGGCTCGCCGACGTTCGG 7680 
ARGRALDVIVDIRVGSPTFG 



7 681 GAAGTGGGACGCGGTGGAG ATGGACACCGAGC ACTTCCGGGCGGTCTACTTCCCCAGGGG 7740 
KWDAVEMDTEHFRAVYFPRG 



77 41 CACCGCGC ACGCCTTCCTCGCGCTTGAGGACGAC ACCCTGATGTCGTACCTGGTC AGCAC 7800 
TAHAFLALEDDTLMSYLVST 



7801 GCCGTACGTGGCCGAGTACGAGCAGGCGATCGACCCGTTCGACCCCGCGCTGGGTCTGCC 7860 
PYVAEYEQAIDPFDPALGLP 



7861 GTGGCCCGCGGACCTGGAGGTCGTGCTCTCCGACCGCGACACGGTGGCCGTGGACCTGGA 7920 
WPADLEVVLSDRDTVAVDLE 



7921 GACCGCCAGGCGGCGAGGGATGCTGCCCGACTACGCCGACTGCCTCGGCGAGGAGCCCGC 7980 
TARRRGMLPDYADCLGEEPA 



7981 CAGCACCGGCAGGTGACGGGTCCCGAGCACGATCTGTTCG AAGTGGCGCAGGCGCT CGTC 8040 
S T G R * 



8041 GTCGCGGTCGA 8051 



FIG. 4B-9 
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FIG. 6B 
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FIG. 11 
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